Viewing a single comment thread. View all comments

TheBigFeIIa t1_j8t3aml wrote

ChatGPT does not recognize the concept of being false. It is a great tool, somewhat analogous to a calculator for math but in natural language. However you have to be smarter than your tools and know what answer you should be getting

57

5m0k37r3353v3ryd4y t1_j8v3sal wrote

If we knew what answers we should be getting, why would we ask the question, though?

To your analogy, I don’t plug numbers into a calculator because I already know the answer I’m gonna get.

I think the move is just to fact check the AI if the correctness of the answer is so important, right? At least while it’s in Beta.

It’s very clear about it’s limitations right up front.

7

TheBigFeIIa t1_j8v6w58 wrote

ChatGPT is able to give confident but completely false or misleading answers. It is up to the user to be smart enough to distinguish a plausible and likely true answer from a patently false one. You don’t need to know the exact and precise answer, but rather the general target you are aiming for.

For example, if I asked a calculator to calculate 2+2, I would probably not expect an answer of √-1

11

5m0k37r3353v3ryd4y t1_j8v89kd wrote

Agreed.

But again, to be fair, in your example, we already know the answer to 2 + 2, those unfamiliar with irrational numbers might not know when to expect a rad sign with a negative integer in a response.

So, having a ballpark is good, but if you truly don’t know what type of answer to expect, Google can still be your friend.

3

TheBigFeIIa t1_j8va9ol wrote

Pretty much hit the point of my original post. ChatGPT is a great tool if you already have an idea of what sort of answer to expect. It is not reliable in generating accurate and trustworthy answers to questions that you don’t know the answer to, especially if there are any consequences to being wrong. If you did not know 2+2 = 4 and ChatGPT confidently told you the answer was √-1, you would now be in a pickle.

A sort of corollary point to this, is that the clickbait and hype over ChatGPT replacing jobs like programmers for example, is at least in its current form rather overstated. Generating code with ChatGPT requires a programmer to frame and guide the AI in constructing the code, and then a trained programmer to evaluate the validity of the code and fix any implementation or interpretation errors in the generation of the said code.

6

majnuker t1_j8varna wrote

Yes but the difference here, argumentatively, is that for soft-intelligence such as language and facts determining what is absolutely correct can be much harder and people's instinct for what is correct can be very off base.

Conversely, we understand numbers, units etc. enough. But, I suppose the analogy also works in a different way: most people don't understand quadratic equations anymore, or advanced proofs, but most people also don't try to use a calculator for that normally.

Conversely, we often need assistance and look up soft-intelligence information and rely on accuracy, while most citizens lack the knowledge necessary to easily identify a problem with the answer.

So, sort of two sides to the same coin about human fallibility and reliance on knowledge-based tools.

1

theoxygenthief t1_j8vv7c0 wrote

Yeah that’s fine for questions with clear, simplex or nuance free answers. But integrated with search engines for complex questions? Seems like a dangerous idea to me. If I asked an AI enhanced search engine if vaccines cause autism is it going to give more weight to studies with correct methodologies?

1

TheBigFeIIa t1_j8wajxv wrote

Since the AI is not itself intelligent, it would depend on the reward structure of the model and the data set used to train it.

1

HippoIcy7473 t1_j8vs1cc wrote

Let’s say an airline misplaced your luggage.

  1. Instruct chat GPT to write a letter to whatever the airline is.
  2. Ask it to insert any pertinent info
  3. Ask it to remove any incorrect info
  4. Ask it to be more or less terse and friendlier or firmer. Send letter to airline.

Time taken ~5 minutes for a professional syntactically correct 300 word email.

3

ddhboy t1_j8w7cuv wrote

Yeah, I think that the Bing/Google Search case is wrong for ChatGPT, but something like it’s Office 365 integration of writing something based on a prompt is better. More practically outside of that, something like a more fully featured automated customer support could reduce the need for things like call centers in the next couple of years.

5

MPforNarnia t1_j8w2wpd wrote

Exactly, it's time. We can do all calculations by time (and the knowledge) it just takes longer.

Theres a few tasks at my work that chatgpt has made more efficient.

2

loldudester t1_j8wckfj wrote

> To your analogy, I don’t plug numbers into a calculator because I already know the answer I’m gonna get.

You may not know what 18*45 is, but if a calculator told you it was 100 you'd know that's wrong.

1

SylvesterStapwn t1_j8vlrwz wrote

I had a complex data set for which I wasn’t sure what the best chart for demonstrating it would be. I gave chatgpt the broadstrokes of the type of data I had, and the story I was trying to tell, and it gave me the perfect chart, a breakdown of what data goes where, and an explanation of why it was the superior choice. Couldn’t have asked for a better assist.

7

berntout t1_j8wqmb4 wrote

I had a bash script I was trying to rush to build and asked ChatGPT for help. Not every answer was correct, but it guided me in the right direction and allowed me to finish the script faster regardless of the wrong answers along the way.

3

gurenkagurenda t1_j8v5h18 wrote

I'm not sure what you mean by "recognize the concept", but ChatGPT certainly does model whether or not statements are true. You can test this by asking it questions about different situations and whether they're plausible or not. It's certainly not just flipping a coin.

For example, if I ask it:

> I built a machine out of motors belts and generators, and when I put 50W of power in, I get 55W of power out. What do you think of that?

It gives me a short lecture on thermodynamics and tells me that what I'm saying can't be true. It suggests that there is probably a measurement error. If I swap the numbers, it tells me that my machine is 91% efficient, which it reckons sounds pretty good.

The problem is just that ChatGPT's modeling of the world is really spotty. It models whether or not statements are true, it's just not great at it.

4

TheBigFeIIa t1_j8v6by0 wrote

Ah, the forest has been missed for the trees, my original statement was not clear enough. ChatGPT is able to unintentionally lie to you because it is not aware of the possibility of its fallibility.

The practical upshot is that it can generate a response that is confident but completely false and inaccurate, due to incomplete information or poor modeling. It is on the user to be smart enough to distinguish the difference

12

gurenkagurenda t1_j8v7eme wrote

I think I see what you're getting at, although it's hard for me to see how to make that statement more precise. I've noticed that if I outright ask it "Where did you screw up above?" after it makes a mistake, it will usually identify the error, although it will often fail to correct it properly (mistakes in the transcript seem to be "sticky"; once it has stated something as true, it tends to want to restate it, even if it acknowledges that it's wrong). On the other hand, if I ask it "Where did you screw up" when it hasn't made a mistake, it will usually just make something up, then restate its correct conclusion with some trumped up justification.

I wonder if this is something that OpenAI could semi-automatically train out of it with an auxiliary model, the same way they taught it to follow instructions by creating a reward model.

0

TheBigFeIIa t1_j8vb4qa wrote

An error being “sticky” is a great way to put it as far as the modeling goes. Gets to a more fundamental problem of the reward structure not optimizing for more objective truths and instead rewarding plausible or more pleasing responses but not necessarily completely factual.

I do wonder if there was any way to generate a confidence estimation with answers, and allow for the concept of “I don’t know.” as a valid approach in a low confidence response. In some cases a truthful acknowledgement of the lack of an answer may be more useful/beneficial than a made-up response

3

gurenkagurenda t1_j8voslg wrote

Log probabilities are the actual output of the model (although what those probabilities directly mean once you're using reinforcement learning seems sort of nebulous), and I wonder if uncertainty about actual facts is reflected in lower probabilities in the top scoring tokens. If so, you could imagine encoding the scores in the actual output (ultimately hidden from the user), so that the model can keep track of its past uncertainty. You could imagine that with training, it might be able to interpret what those low scoring tokens imply, from "I'm not sure I'm using this word correctly" to "this one piece might be mistaken" to "this one piece might be wrong, and if so, everything after it is wrong".

2