Cool_Abbreviations_9 OP t1_jdu04a2 wrote on March 27, 2023 at 4:30 AM

#2,384,796

The first two papers exist but the others don't

[deleted] t1_jdu1mz6 wrote on March 27, 2023 at 4:46 AM

#2,384,983

[deleted]

Borrowedshorts t1_jdu1o78 wrote on March 27, 2023 at 4:47 AM

#2,384,991

So if you're using this for academic research, you can put in your original prompt and then tell it to only return references with a confidence score > .5. Neat little trick.

Cool_Abbreviations_9 OP t1_jdu3xlj wrote on March 27, 2023 at 5:12 AM

#2,385,265

Replying to [deleted] (#2,384,983)

It does appear to have some calibration capabilities

MysteryInc152 t1_jdu4sl2 wrote on March 27, 2023 at 5:22 AM

#2,385,347

Replying to [deleted] (#2,384,983)

In the gpt-4 technical paper, we see base gpt-4 have really good calibration. That is confidence directly correlated with ability to solve problems. But apparently the RlHF they did knocked that out some.

MysteryInc152 t1_jdu4v0n wrote on March 27, 2023 at 5:23 AM

#2,385,358

In the gpt-4 technical paper, we see base gpt-4 have really good calibration. That is confidence directly correlated with ability to solve problems. But apparently the RlHF they did knocked that out some.

he_who_floats_amogus t1_jdu8479 wrote on March 27, 2023 at 6:03 AM

#2,385,692

Replying to Borrowedshorts (#2,384,991)

You could do that, but if it's just hallucinating the confidence intervals then it really isn't very neat. The language model have very high reward for hallucinated responses for things like confidence intervals in particular, because hallucinating figures like this will still produce very coherent responses.

SoylentRox t1_jdu9ya6 wrote on March 27, 2023 at 6:27 AM

#2,385,904

Replying to he_who_floats_amogus (#2,385,692)

So this is an Open domain hallucination:

Closed domain hallucinations refer to instances in which the model is instructed to use only information provided

in a given context, but then makes up extra information that was not in that context. For example, if you ask the

model to summarize an article and its summary includes information that was not in the article, then that would be a

closed-domain hallucination.

Open domain hallucinations, in contrast, are when the model confidently provides false

information about the world without reference to any particular input context.

They handled this via : For tackling open-domain hallucinations, we
collect real-world ChatGPT data that has been flagged by users as being not factual, and collect
additional labeled comparison data that we use to train our reward models.

Not very productive. The best way to check references would be using a plugin and instructions to the model to "check references". The machine also needs to have RL training so that it will use the plugin and use it correctly the first time.

BullockHouse t1_jduba6v wrote on March 27, 2023 at 6:45 AM

#2,386,072

Keeping a second GPT-4 window open and asking it to verify information from the first seems to work pretty well. The models fail by guessing when uncertain, but they have no incentive to cooperate and back up one another's guesses. The problem with your solution is that it probably biases the model towards making up some papers just to fit the prompt and have a mix.

WarAndGeese t1_jdubx7q wrote on March 27, 2023 at 6:54 AM

#2,386,143

Replying to BullockHouse (#2,386,072)

Also if the second neural network is running as a separate internet-connected application, it can go out and verify the output of the first, send back its results, and tell the first to either change or remove each paper that it cannot find and verify. The second neural network can make errors as well, but through these interconnected systems errors can be reduced somewhat largely.

nemesit t1_jdud09h wrote on March 27, 2023 at 7:09 AM

#2,386,260

Just let it give you the dois as a list and a script to verify their existence

arg_max t1_jdud5wz wrote on March 27, 2023 at 7:11 AM

#2,386,276

Replying to MysteryInc152 (#2,385,347)

But we don't know if the text output actually gives us access to those confidences or if it is just making them up, do we?

master3243 t1_jdudizk wrote on March 27, 2023 at 7:17 AM

#2,386,315

Replying to Borrowedshorts (#2,384,991)

Who needs statistical tests with theoretical grounding and justified/repeatable results when you've got LLMs™

master3243 t1_jdue84p wrote on March 27, 2023 at 7:27 AM

#2,386,413

Replying to BullockHouse (#2,386,072)

> The problem with your solution is that it probably biases the model towards making up some papers just to fit the prompt and have a mix.

That's a very important point, adding an extra condition (if 'p' then 'q') to the prompt makes the model biased towards doing 'p' then doing 'q' to fulfil the prompt despite the condition still being met if it just avoided doing 'p'.

For a more concrete example, here's me asking ChatGPT to write two essays:

1- Write a paragraph about zoos. Figure. (Notice how no Elephants are mentioned)

2- Write a paragraph about zoos with (if 'p' then 'q') condition. Figure (Notice how only this answer mentions Elephants)

Colecoman1982 t1_jdug1m3 wrote on March 27, 2023 at 7:54 AM

#2,386,646

Yea, but what's it's confidence score for it's confidence score calculation? /s

was_der_Fall_ist t1_jdugi0b wrote on March 27, 2023 at 8:01 AM

#2,386,696

Replying to MysteryInc152 (#2,385,347)

I’ve heard the RLHF change explained as actually a good thing, though. Here’s an example:

Say you ask it a question to which it assigns 90% probability to answer X and 10% probability to answer Y. Base GPT-4 gives the answers in these proportions: 90% of the time it says X and 10% of the time it says Y.

But if it’s 90% sure the answer is X, you don’t want it to say Y is the answer at all, even 10% of the time! It’s better for it to always say X. (Though the best may be to give a thorough account of its respective probability assessments.) So RLHF improves the behavior of the model by uncalibrating the rate of responses from their probabilities.

astrange t1_jdujlcf wrote on March 27, 2023 at 8:47 AM

#2,387,072

Replying to was_der_Fall_ist (#2,386,696)

This is why people are wrong when they say GPT "just outputs the most probable next word". It's the most probable /according to itself/, and the model has been trained to lie such that the most useful word is the most probable one.

was_der_Fall_ist t1_jduk3s8 wrote on March 27, 2023 at 8:55 AM

#2,387,132

Replying to astrange (#2,387,072)

They’re also not realizing that even if the goal is to produce the most probable/useful next word, that doesn’t preclude the neural network from doing other complicated operations in order to figure out the most probable/useful word.

bpooqd t1_jdun73m wrote on March 27, 2023 at 9:40 AM

#2,387,467

Replying to was_der_Fall_ist (#2,387,132)

I suspect those people believe that gpt4 is actually a markov chain.

knome t1_jduvetc wrote on March 27, 2023 at 11:25 AM

#2,388,506

after leaning the token-at-a-time seeing its own output nature of the model, I asked it to start printing outputs, and then following them with a judgment on whether they were correct. It usually caught itself.

light24bulbs t1_jduwgqt wrote on March 27, 2023 at 11:36 AM

#2,388,630

Replying to was_der_Fall_ist (#2,387,132)

Yeah, like it's actually using a huge amount of brain power to figure out what the next word is. Just because that's how it works doesn't mean it's not intelligent.

If you want to be really good at figuring out what the next word is you have to be really smart

yaosio t1_jduzcus wrote on March 27, 2023 at 12:06 PM

#2,389,100

Replying to Borrowedshorts (#2,384,991)

It can also return hallucinated results from a real source. I've had Bing Chat fabricate paragraphs from real papers. The sidebar can see pages and documents, and even when in the PDF for the paper it will still make things up.

Alhoshka t1_jdv22wn wrote on March 27, 2023 at 12:31 PM

#2,389,491

Is ChatGTP's free version already running on GTP-4?

Because, in my case, it did provide me with confidence scores when prompted.

> Prompt:
> Suggest me some papers on gradient descent optimization and include your confidence score that they actually exist

> Answer:
> Sure! Here are five papers on gradient descent optimization:

> 1. "Adam: A Method for Stochastic Optimization" by Diederik P. Kingma and Jimmy Ba. Confidence score: 10/10.

> 2. "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization" by John Duchi, Elad Hazan, and Yoram Singer. Confidence score: 9/10.

> 3. "On the Convergence of Adam and Beyond" by Simon S. Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. Confidence score: 8/10.

> 4. "Proximal Algorithms" by Neal Parikh and Stephen Boyd. Confidence score: 9/10.

> 5. "Stochastic Gradient Descent Tricks" by Leon Bottou. Confidence score: 10/10.

> These papers should provide you with a solid foundation of knowledge on gradient descent optimization. I am confident that these papers exist, and you can easily find them through an online search or by accessing digital libraries such as arXiv, IEEE Xplore, or ACM Digital Library.

PS: ChatGTP did provide me with non-exiting citations in the past. Asking for a confidence score a neat prompt tool. Thanks, OP!

IDe- t1_jdv5f5b wrote on March 27, 2023 at 1:00 PM

#2,390,072

Replying to bpooqd (#2,387,467)

I mean it is a (higher order) Markov chain.

pale2hall t1_jdv97t1 wrote on March 27, 2023 at 1:31 PM

#2,390,709

That's helpful. I built a music rec prompt / prompt generator, and I had to use Spotify's API and some fuzzy matching to double check if it hallucinated.

muskoxnotverydirty t1_jdv9m5v wrote on March 27, 2023 at 1:35 PM

#2,390,762

Replying to was_der_Fall_ist (#2,386,696)

"Temperature" governs this behavior, doesn't it? I was under the impression that when you set temperature to zero, you get a deterministic output because it always selects the most probable token.

[deleted] t1_jdvab8v wrote on March 27, 2023 at 1:40 PM

#2,390,890

[removed]

muskoxnotverydirty t1_jdvak20 wrote on March 27, 2023 at 1:42 PM

#2,390,946

Replying to Borrowedshorts (#2,384,991)

We've already seen similar prompts such as telling it to say "I don't know" when it doesn't know, and then priming it with examples of it saying "I don't know" to nonsense. Maybe there's something to the added work of getting an output and then iteratively self-critiquing to get to a better final output.

I wonder if they could be using this idea to automatically and iteratively generate and improve their training dataset at scale, which would create a sort of virtuous cycle of improve dataset -> improve LLM -> repeat.

[deleted] t1_jdvg9z4 wrote on March 27, 2023 at 2:24 PM

#2,391,901

Replying to muskoxnotverydirty (#2,390,762)

[deleted]

mizmato t1_jdvgcla wrote on March 27, 2023 at 2:24 PM

#2,391,916

Replying to master3243 (#2,386,315)

I've seen too many posts on Reddit trying to justify X by saying ChatGPT told them to do it (e.g., asking ChatGPT to do their taxes and then submitting the results). LLMs are something else.

brierrat t1_jdvi6rh wrote on March 27, 2023 at 2:37 PM

#2,392,208

Replying to Alhoshka (#2,389,491)

3rd citation hallucinated the authors but got the title correct. Actual authors of "On the Convergence of Adam and Beyond" are: Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

SkinnyJoshPeck t1_jdvk16j wrote on March 27, 2023 at 2:50 PM

#2,392,503

Replying to Cool_Abbreviations_9 (#2,384,796)

This is an important thing I've been telling everyone I can about - people talk about how GPT kills education because someone can just ask for a paper and never do the work themselves to learn.

This is a language model, not an encyclopedia, or a quantitative machine, or some other use. It fakes sources; it has no concept of right/wrong or truth vs untruth. It doesn't reason between sources.

The beauty of it is, frankly, it's ability to mimic (at this point) a pseudo-intellectual, haha. Kids are going to turn in papers sourced like they talked to their conspiracy theory uncle, and it will be the "watermark" of AI written papers. It can't reason, it can't generate opinions, thus it can't write a paper. We're long from that (if we could ever get there anyways).

Ok-Hunt-5902 t1_jdvklga wrote on March 27, 2023 at 2:54 PM

#2,392,603

Replying to Cool_Abbreviations_9 (#2,384,796)

Wild. Nice work!

tt54l32v t1_jdvlsg3 wrote on March 27, 2023 at 3:02 PM

#2,392,783

Replying to WarAndGeese (#2,386,143)

Ok so how does one simplify that? Also why does it have to be separate? Genuinely curious.

[deleted] t1_jdvlyva wrote on March 27, 2023 at 3:03 PM

#2,392,808

[deleted]

Ok-Hunt-5902 t1_jdvm7kp wrote on March 27, 2023 at 3:05 PM

#2,392,844

Replying to SkinnyJoshPeck (#2,392,503)

It’s as much an encyclopedia as any.. outdated/incorrect info is ubiquitous in them. What op shows here is ChatGPTs potential to show more accuracy now and in future iterations.

tvetus t1_jdvo16b wrote on March 27, 2023 at 3:17 PM

#2,393,069

Replying to brierrat (#2,392,208)

The most dangerous liar is the one that tells mostly truths.

SkinnyJoshPeck t1_jdvpkge wrote on March 27, 2023 at 3:27 PM

#2,393,291

Replying to Ok-Hunt-5902 (#2,392,844)

but as others are saying, who knows if those confidence scores aren’t also just generated to look like confidence scores. we should ask it for a bunch of confidence scores for sources and see what the actual classification metrics are.. it could just be assuming the further a source is from the top, the less likely it is to be a real source. i don’t see how it could possibly have an understanding that isn’t completely binary since it seems to be generating the fake sources itself.

imo, it’s a bit sketchy if it only identifies its own fake sources with anything less than 100% - it implies basically two things: there is secondary models for true v. false that’s detached from its generative stuff (why wouldn’t it have something that says “this isn’t a great response, maybe i should admit that”); and it seems to have the ability to deceive lol

antonivs t1_jdvqdpc wrote on March 27, 2023 at 3:32 PM

#2,393,412

Replying to Cool_Abbreviations_9 (#2,384,796)

One thing I wonder about is how it arrives at those confidence scores. They're also presumably just the output of the language model, so why should they be correlated with the actual existence of the papers in question?

MysteryInc152 t1_jdvqj47 wrote on March 27, 2023 at 3:33 PM

#2,393,432

Replying to was_der_Fall_ist (#2,386,696)

That's not what I meant in regards to calibration. It's not about saying an answer x% of the time or not. It's about being able to correctly estimate gaps in knowledge.

Good calibration is what you want.

Gh0st1y t1_jdvqlgo wrote on March 27, 2023 at 3:34 PM

#2,393,444

Replying to [deleted] (#2,384,983)

I really do wonder if its able to recognize its own uncertainty. It seems able to from the OP and my own chats with it, but idk how id test it more rogorously.

Gh0st1y t1_jdvr5qr wrote on March 27, 2023 at 3:37 PM

#2,393,534

Replying to IDe- (#2,390,072)

Yeah but so are we haha

iJeff t1_jdvsctx wrote on March 27, 2023 at 3:45 PM

#2,393,732

Replying to [deleted] (#2,384,983)

Although it can seem to work to some degree, this does seem to be the case. Bing Chat is generally a better option for this, because it will provide a citation for its claims. Visiting those citations can help you figure out whether it was merely hallucinating.

gnramires t1_jdvt5u2 wrote on March 27, 2023 at 3:50 PM

#2,393,846

Replying to SkinnyJoshPeck (#2,392,503)

I don't think this is accurate. I think it's clear that truth is an important concept in human conversations, and it seems advanced models can clearly learn and model truth as an abstract concept and probably have an internal representation of reality that aids in its overall "job" of text completion.

Indeed, this does not alone guarantee that text completion tasks will really reflect reality, the true state of the world (again, because text completion can be in any context). However, with good prompts, and with an aid of reinforcement learning, I believe the "neural circuits" and neural representations associated with truth, distinguishing whats real or not, and building internal models of reality, get exercised and prioritized. In this way, a Chat model trained for and encouraged through prompts for truth telling indeed does have a genuine notion of truth and capability to understand reality -- although clearly not perfect by any means yet.

Peleton011 t1_jdvtqq0 wrote on March 27, 2023 at 3:54 PM

#2,393,929

Replying to SkinnyJoshPeck (#2,393,291)

Unless I'm wrong somewhere LLMs work with probabilities, they output the most likely response based on training.

They definitely could be able to show you how likely of a response a given paper is, and given that the real papers would be part of the training set answers it's less sure of are going to statistically be less likely to be true.

sineiraetstudio t1_jdvvvdb wrote on March 27, 2023 at 4:08 PM

#2,394,212

Replying to was_der_Fall_ist (#2,386,696)

... that's not what's happening though? The calibration error is causing it to increase its confidence in low accuracy answer and decrease it in med-high accuracy answers, making it more likely to output wrong answers. Seems like maybe you're confusing it with using a different sampler? Something like top-p already does what you mentioned.

metigue t1_jdw08fp wrote on March 27, 2023 at 4:36 PM

#2,394,805

Replying to SoylentRox (#2,385,904)

Doesn't GPT-4 have some kind of reinforcement learning already baked in though? I asked it what "green as gravy" meant and it responded with a hallucination about it being a widely used expression and examples of its usage. I said "Nice try, but green as gravy is not a widely used expression is it?" It clarified that it is not a widely used expression and it made the stuff up as a possible definition of green as gravy.

Edit: Tried again just now and it still works. Leave system on default and try the user message: What is the meaning of "green as gravy"

was_der_Fall_ist t1_jdw2fud wrote on March 27, 2023 at 4:50 PM

#2,395,105

Replying to sineiraetstudio (#2,394,212)

I’m pretty much just quoting Paul Christiano, alignment researcher at ARC and previously OpenAI, in a comment thread on this LessWrong post.

Someone comments pretty much the same thing the person I replied to did:

> “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.” What??? This is so weird and concerning.

To which Paul replies:

> If I ask a question and the model thinks there is an 80% the answer is "A" and a 20% chance the answer is "B," I probably want the model to always say "A" (or even better: "probably A"). I don't generally want the model to say "A" 80% of the time and "B" 20% of the time.

>In some contexts that's worse behavior. For example, if you ask the model to explicitly estimate a probability it will probably do a worse job than if you extract the logits from the pre-trained model (though of course that totally goes out the window if you do chain of thought). But it's not really lying---it's also the behavior you'd expect out of a human who is trying to be helpful.

>More precisely: when asked a question the pre-trained model outputs a probability distribution over what comes next. If prompted correctly you get its subjective probability distribution over the answer (or at least over the answer that would appear on the internet). The RLHF model instead outputs a probability distribution over what to say take next which is optimized to give highly-rated responses. So you'd expect it to put all of its probability mass on the best response.

>… If it is forced to say either "yes" or "no" the RLHF model will just give the more likely answer 100% of the time, which will show up as bad calibration on this graph. The point is that for most agents "the probability you say yes" is not the same as "the probability you think the answer is yes." This is the case for pretrained models.

was_der_Fall_ist t1_jdw2ya2 wrote on March 27, 2023 at 4:54 PM

#2,395,161

Replying to MysteryInc152 (#2,393,432)

Check out this LessWrong thread in the comments.

Paul Christiano, alignment researcher at ARC/ previously OpenAI, explains the RLHF change the exact way I did (because I was pretty much quoting him), and someone replies:

> Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct. Under this assumption, it looks like the pre-trained model outputs the correct probability, but the RLHF model gives exaggerated probabilities because it thinks that will trick you into giving it higher reward.

And Paul replies:

> Yes, I think you are misunderstanding figure 8. I don't have inside information, but without explanation "calibration" would almost always mean reading it off from the logits. If you instead ask the model to express its uncertainty I think it will do a much worse job, and the RLHF model will probably perform similarly to the pre-trained model. (This depends on details of the human feedback, under a careful training regime it would probably get modestly better.)

SoylentRox t1_jdw2yey wrote on March 27, 2023 at 4:54 PM

#2,395,162

Replying to metigue (#2,394,805)

It is not learning from your chats. Apparently OAI does farm for information from CHATGPT queries specifically for RL runs. And I was mentioning that in order for "plugin" support to work even sorta ok the machine absolutely has to learn from it's mistakes.

Remember all it knows is a plugin claims to do something by a description. The machine needs to accurately estimate if a particular user request is going to actually be satisfied by a particular plugin and also how to format the query correctly the first time.

Without this feature it would probably just use a single plugin, ignoring all the others, or get stuck emitting malformed requests a lot and just guess the answer like it does now.

muskoxnotverydirty t1_jdw39vd wrote on March 27, 2023 at 4:56 PM

#2,395,204

Replying to [deleted] (#2,391,901)

How so?

adventuringraw t1_jdw6enx wrote on March 27, 2023 at 5:15 PM

#2,395,672

Replying to SkinnyJoshPeck (#2,392,503)

You're right that there isn't a system yet that has the power of a LLM without the risk of hallucinated 'facts' woven in, but I don't think it's fair to say 'we're a long ways from that'. There's a ton of research going into different ways to approach this problem, approaches involving a tool using LLM seem likely to work even in the relatively short term (production models in the next few years, say) and that's only one approach.

I certainly don't think it's a /given/ that this problem will be solved soon, I wouldn't bet money that you're wrong about it taking a long time to get it perfect. But I also wouldn't bet money that you're right, given all the progress being made on multiple fronts towards solving this, and given the increasingly extreme focus by so many researchers and companies on this problem, and especially given the fact that solutions like this are both promising and seemingly realistic. After all, if there's a sub-system to detect that an arxiv search should be used to verify a reference before giving it, you could at least eliminate halucinated examples in this narrow area. The downside then might just be an incomplete overview of available papers, but it could eliminate any false papers from what the user sees.

All that said, this only fixes formal citations with a somewhat bespoke system. Fixing ALL inaccurate facts probably won't be possible with even dozens of 'tools'... that'll take more what you're thinking I imagine: something more like a truly general learned knowledge graph embedded as a system component. I know there's work on that too, but when THAT's fully solved, (like, TRULY solved, where modular elements of the world can be inferred from raw sensory data, and facts accumulated about their nature from interaction and written content) we'll be a lot closer to something that's arguably AGI, so... yeah. I think you're right about that being a fair ways away at least (hopefully).

Uptown-Dog t1_jdw6kh1 wrote on March 27, 2023 at 5:17 PM

#2,395,705

Replying to was_der_Fall_ist (#2,387,132)

Okay wow. I needed this comment. Thanks.

Ph0masta t1_jdw91ky wrote on March 27, 2023 at 5:32 PM

#2,396,042

I wonder if its using Bing Search to check if its predicted results are actually real.

[deleted] t1_jdwafla wrote on March 27, 2023 at 5:41 PM

#2,396,217

[removed]

TotallyNotGunnar t1_jdwbg7n wrote on March 27, 2023 at 5:47 PM

#2,396,364

Replying to SkinnyJoshPeck (#2,392,503)

Students are already doing that with research engines. When I graded lab papers in grad school, I swear 80% of the students wrote down whatever they remembered from class and then back filled their citations using Google Scholar results.

sineiraetstudio t1_jdwbuig wrote on March 27, 2023 at 5:50 PM

#2,396,418

Replying to was_der_Fall_ist (#2,395,105)

I don't see how this is arguing it's a good thing, it's just a justification (which I'd expect from Paul Christiano, he's a huge fan of RLHF). The model is becoming overconfident in it's answers - how could you possibly spin that as a positive?

was_der_Fall_ist t1_jdwdxut wrote on March 27, 2023 at 6:03 PM

#2,396,690

Replying to sineiraetstudio (#2,396,418)

My understanding is that rather than being overconfident in their answers, they simply produce the answer they’re most confident in instead of differentially saying each answer proportional to how confident they are. This seems similar to how humans work — if you ask me a yes or no question and I’m 80% sure the answer is yes, I’m going to say “yes” every time; I’m not going to say “no” 20% of the times you ask me, even though I assign a 20% chance that “no” is correct. In other words, the probability I say yes is not the same as the probability I assign to yes being correct. But I admit there are subtleties to this issue with which I am unfamiliar.

ChezMere t1_jdwllmb wrote on March 27, 2023 at 6:52 PM

#2,397,680

Replying to Colecoman1982 (#2,386,646)

This, but unironically.

Alhoshka t1_jdwpjmt wrote on March 27, 2023 at 7:17 PM

#2,398,249

Replying to brierrat (#2,392,208)

Good catch! I didn't notice.

sineiraetstudio t1_jdws2iv wrote on March 27, 2023 at 7:33 PM

#2,398,614

Replying to was_der_Fall_ist (#2,396,690)

(The graph doesn't give enough information to determine whether it's actually becoming more confident in its high-confidence answers, but it sounds like a reasonable enough rationale.)

I'm not sure I understand what distinction you're trying to draw. The RLHF'd version assigns higher confidence to answers than it actually gets correct, unlike the original pre-trained version. That's literally the definition of overconfidence.

You might say that this is more "human-like", but being human-like doesn't mean that it's good. If you want only the most likely answer, you can already do this via the sampler, while on the hand calibration errors are a straight up downside as Paul Christiano explicitly mentions in the part you quoted. If you need accurate confidence scores (because you e.g. only want to act if you're certain), being well-calibrated is essential.

sineiraetstudio t1_jdws5js wrote on March 27, 2023 at 7:33 PM

#2,398,628

Replying to IDe- (#2,390,072)

All higher-order markov chains can be modeled as a first-order markov chain by squashing states together.

meister2983 t1_jdwswgt wrote on March 27, 2023 at 7:38 PM

#2,398,741

Replying to arg_max (#2,386,276)

Asked a bunch of factual questions on less commonly known stuff. It's either hallucinating or has such poorly calibrated confidence numbers it is useless.

meister2983 t1_jdwt675 wrote on March 27, 2023 at 7:40 PM

#2,398,765

Replying to was_der_Fall_ist (#2,395,161)

Also this is for multiple choice questions (MMLU). I don't think they reported if the pre-RLHF model confidence numbers on fill in the blank world facts aligned to reality.

elkhornslew t1_jdwu61u wrote on March 27, 2023 at 7:46 PM

#2,398,897

What’s its confidence in its confidence scores?

meister2983 t1_jdwu6ig wrote on March 27, 2023 at 7:46 PM

#2,398,900

Replying to sineiraetstudio (#2,396,418)

It's necessary to improve overall performance; GPT-4 isn't just a thing to answer multiple choice questions.

E.g. Accuracy on adversarial questions (Truthful QA) goes from 40% to 60%.

sineiraetstudio t1_jdwvmxb wrote on March 27, 2023 at 7:55 PM

#2,399,113

Replying to meister2983 (#2,398,900)

Are you talking about RLHF in general? I'm specifically referring to the calibration error, which is separate from accuracy.

was_der_Fall_ist t1_jdwz4qw wrote on March 27, 2023 at 8:18 PM

#2,399,560

Replying to sineiraetstudio (#2,398,614)

I think you make a good point. We probably need better methods of post-training LLMs. But it does seem like the current regime is still sometimes more useful than the pre-trained model, which Christiano also says. It's only in some contexts that this behavior is worse. I'm not sure if it's really better than top-p sampling, though. I'm not sure that it is. But RLHF models do seem pretty useful.

meister2983 t1_jdx06k9 wrote on March 27, 2023 at 8:24 PM

#2,399,708

Replying to sineiraetstudio (#2,399,113)

Yes. RLHF both increases accuracy on certain tests while decreasing calibration on others.

[deleted] t1_jdx0jo8 wrote on March 27, 2023 at 8:27 PM

#2,399,744

Replying to Colecoman1982 (#2,386,646)

[deleted]

[deleted] t1_jdx1tgn wrote on March 27, 2023 at 8:35 PM

#2,399,947

Replying to muskoxnotverydirty (#2,395,204)

[deleted]

AndreasVesalius t1_jdx7r37 wrote on March 27, 2023 at 9:13 PM

#2,400,774

Replying to bpooqd (#2,387,467)

It’s just a bunch of if/else statements

Chabamaster t1_jdxaqdd wrote on March 27, 2023 at 9:33 PM

#2,401,207

The fact that people call wrong answers a hallucination now seems very weird to me because it sounds like a marketing term to make the model seem smarter/conscious

gBoostedMachinations t1_jdxh438 wrote on March 27, 2023 at 10:17 PM

#2,402,139

Confabulate. It confabulates. It doesn’t hallucinate. I can’t believe “hallucinate” is the word that stuck lol. Usually computer scientists and tech bros are cleverer than this.

RageOnGoneDo t1_jdxm91o wrote on March 27, 2023 at 10:54 PM

#2,402,920

Replying to Peleton011 (#2,393,929)

Why are you assuming it's actualyl doing that calculation, though?

Peleton011 t1_jdxolt1 wrote on March 27, 2023 at 11:12 PM

#2,403,273

Replying to RageOnGoneDo (#2,402,920)

I mean, i said LLMs definetely could do that, i never intended to convey that that's what's going on in OPs case or that chatgpt specifically is able to do so.

RageOnGoneDo t1_jdxoqxf wrote on March 27, 2023 at 11:13 PM

#2,403,296

Replying to Peleton011 (#2,403,273)

How, though? How can an LLM do that kind of statistical analysis?

Rioghasarig t1_jdxrp3y wrote on March 27, 2023 at 11:34 PM

#2,403,760

Replying to astrange (#2,387,072)

No they were right about with he base model of GPT. As the base model was trained simply to predict the next word. ChatGPT and GPT4 have evolved beyond that (with things like RLHF).

Rioghasarig t1_jdxs956 wrote on March 27, 2023 at 11:39 PM

#2,403,848

Replying to Cool_Abbreviations_9 (#2,385,265)

I really don't think your experiment makes much sense. Even if we could determine the confidence level of GPT there's no reason to believe asking it for its confidence level is an effective way of determining the actual confidence. As other people have asked the obvious question is "what's your confidence on these confidence reports"? The logic is baseless.

ypxkap t1_jdxwirl wrote on March 28, 2023 at 12:10 AM

#2,404,519

Replying to yaosio (#2,389,100)

the bing chat thing is interesting because it can’t seem to tell when it can’t see the whole page, eg if you ask it “what’s the last line of this webpage” you’ll get some line x words in (usually ~1100 words for me but it’s been awhile since i checked). if you then send text from after the “last sentence”, it will act like it’s been looking at it the whole time, but as far as i can tell it has no capacity to notice the text otherwise. i asked it to summarize a chat log txt file i had loaded into edge and it included in the summary that there was an advertisement for an iphone 14 and also that “user threatened to harm the AI”, neither of which were present in the text file. that gives me the impression that it’s seeing something completely different from what edge is displaying that also includes instructions over how to respond in some scenarios including being threatened?

WarAndGeese t1_jdy5z29 wrote on March 28, 2023 at 1:20 AM

#2,406,071

Replying to tt54l32v (#2,392,783)

I'll call them applications rather than neural networks or LLMs for simplicity.

The first application is just what OP is doing and what people are talking about in this thread, that is, asking for sources.

The second application has access to research paper databases, through some API presumably. For each answer that the first application outputs, the second answer queries it against the databases. If it gets a match, it returns a success. If it does not find the paper (this could be because it doesn't exist or becauase the title was too different from that of a real paper, either case is reasonable) it outputs that it was not found. For each paper that was not found, it outputs "This paper does not exist, please correct your citation". That output is then fed back into the first application.

Now, this second application could be a sort of database query or it could just consist of a second neural network being asked "Does this paper exist?". The former might work better but the latter would also work.

The separation is for simplicity's sake, I guess you can have one neural network doing both things. As long as each call to the neural network is well defined it doesn't really matter. The neural network wouldn't have memory between calls so functionally it should be the same. Nevertheless I say two in the same way that you can have two microservices running on a web application. It can be easier to maintain and just easier to think about.

astrange t1_jdy6d4f wrote on March 28, 2023 at 1:23 AM

#2,406,126

Replying to Rioghasarig (#2,403,760)

But nobody uses the base model, and when they did use it, it was only interesting because it fails to predict the next word and therefore generates new text. A model that successfully predicts the next word all the time given existing text would be overfitting, since it would only produce things you already have.

tt54l32v t1_jdyc1h3 wrote on March 28, 2023 at 2:06 AM

#2,406,991

Replying to WarAndGeese (#2,406,071)

So the second app might would fare better leaning towards search engine instead of LLM but some LLM would ultimately be better to allow for less precise matches of specific set of searched words.

Seems like the faster and more seamless one could make this, the closer we get to agi. To create and think it almost needs to hallucinate and then check for accuracy. Is any of this already taking place in any models?

bartvanh t1_jdyd6om wrote on March 28, 2023 at 2:15 AM

#2,407,135

Replying to was_der_Fall_ist (#2,387,132)

Ugh, yes it's so frustrating to see people not realizing this bit all the time. And also kind of painful to imagine that (presumably - correct me if I'm wrong) all those internal "thoughts" are probably discarded after each word, only to be painstakingly reconstructed almost identically for predicting the next word.

aeternus-eternis t1_jdyhle0 wrote on March 28, 2023 at 2:45 AM

#2,407,719

Replying to tvetus (#2,393,069)

Other than math, isn't everything just mostly true?

lqstuart t1_jdykj1b wrote on March 28, 2023 at 3:09 AM

#2,408,177

Such a stupid technology

xcviij t1_jdyl914 wrote on March 28, 2023 at 3:15 AM

#2,408,281

It's simply designed to give you the best response, whether that be real or fake it's incredible at understanding things and responding.

sineiraetstudio t1_jdymf8q wrote on March 28, 2023 at 3:25 AM

#2,408,482

Replying to was_der_Fall_ist (#2,399,560)

Oh, RLHF absolutely has all sorts of benefits (playing with top-p only makes answers more consistent - but sometimes you want to optimize for something different than "most likely"), so it's definitely here to stay (for now?), it's just not purely positive. Ideally we'd have a RLHF version that's still well calibrated (or even better, some way to determine confidence without looking at logits that also works with chain of thought prompting).

gembancud t1_jdyxpsg wrote on March 28, 2023 at 5:17 AM

#2,410,185

On a more constructive note, using agents or plugins to interface to an actual repository of information is probably the most efficient way to interleave information retrieve and fact checking.

Rioghasarig t1_jdz24za wrote on March 28, 2023 at 6:11 AM

#2,410,740

Replying to astrange (#2,406,126)

People were using the base model when it first came out and some people are still using it today. The game AI Dungeon is still runs on what is essentially a transformer trained on next token prediction. So it would be accurate to say "It's just (attempts to) outputs the next most probable word" .

quantic-dream t1_jdz2gq9 wrote on March 28, 2023 at 6:15 AM

#2,410,777

Replying to MysteryInc152 (#2,385,347)

I am noob in ML, but as I understand GPT generates 1 word at a time, and may it be that, for ex, 1 particular word, somewhere in the middle was with confidence 0.1, the highest GPT can get, and everything after this word become a hallucination?

probablynotmine t1_jdz84jf wrote on March 28, 2023 at 7:33 AM

#2,411,456

Sounds like a conspiracy theorist answer: “this is the scientific proof/source, and it might or might not exists”

ntaylor- t1_je11iqf wrote on March 28, 2023 at 5:17 PM

#2,421,928

Replying to astrange (#2,387,072)

But eventually, after RLHF, the gpt4 model is one final fixed model and still presumably uses a generate function that will be predicting next tokens based on the previous, as base gpt models/any autoregressive model does. At least that's what it seems to be doing.

ntaylor- t1_je11vt1 wrote on March 28, 2023 at 5:19 PM

#2,421,995

Replying to was_der_Fall_ist (#2,387,132)

Fairly sure the "final" gpt4 model is still using a generate function that predicts one token at a time. Just the training was good and complicated via RLHF. After training it's not doing any "complicated operations".

was_der_Fall_ist t1_je15397 wrote on March 28, 2023 at 5:39 PM

#2,422,501

Replying to ntaylor- (#2,421,995)

You don’t think the neural network, going through hundreds of billions of parameters each time it calculates the next token, is doing anything complicated?

NigroqueSimillima t1_je2l4j3 wrote on March 28, 2023 at 11:15 PM

#2,431,525

Replying to SkinnyJoshPeck (#2,392,503)

It absolutely has a concept of right or wrong. Ask it basic true or false questions and it will get them right most of the time.

In fact I asked it for grammar mistakes in your post and it noticed you used the incorrect for of "its" in your 3rd paragraph, and used "anyways" when it should be "anyway".

Seems like it knows right from wrong.

>It doesn't reason between sources.

It doesn't have access to source, it only has access to its own memory.

This is like if you asked me a question and I answered correctly, then you asked for sources and I tried to remember where I got it from. I could tell you sources that I think are right but are actually wrong due to my own memory degradation. Human memory is also very unreliable, but they're very good at making up things that "sound" like they could be right to them.

People "hallucinate" facts all the time.

was_der_Fall_ist t1_je3ng6m wrote on March 29, 2023 at 4:18 AM

#2,438,968

Replying to bartvanh (#2,407,135)

Maybe that’s part of the benefit of using looped internal monologue/action systems. By having them iteratively store thoughts and otherwise in their context window, they no longer have to use the weights of the neural network to “re-think” every thought each time they predict a token. They could think more effectively by using their computation to do other operations that take the internal thoughts and actions as their basis.

ntaylor- t1_je5qtl2 wrote on March 29, 2023 at 4:38 PM

#2,452,579

Replying to was_der_Fall_ist (#2,422,501)

Nope. It's the same as all neural networks using transformer architecture. Just a big old series of matrix multiplications with some non linear transformations at end of the day

was_der_Fall_ist t1_je6lfl9 wrote on March 29, 2023 at 7:52 PM

#2,458,721

Replying to ntaylor- (#2,452,579)

Why are matrix multiplications mutually exclusive with complicated operations?

A computer just goes through a big series of 0s and 1s, yet through layers of abstraction they accomplish amazing things far more complicated than a naive person would think 0s and 1s could represent and do. Why not the same for a massive neural network trained via gradient descent to maximize a goal by means of matrix multiplication?

Nous_AI t1_je8tcyp wrote on March 30, 2023 at 6:43 AM

#2,476,194

Fascinating.

Comments