Viewing a single comment thread. View all comments

EmmyNoetherRing t1_j6i8xfv wrote

I hate to say it, but I think the actual answer to “as compared to what” is “as compared to my human professor”.

People using it to learn are having interactions that mimic interactions with teachers/experts. When they mention hallucinations, I think it’s often in that context.

4

mettle OP t1_j6im95b wrote

this is true so far, it would seem.

you'd think there'd be some clever folks trying to quantify things better.

1

EmmyNoetherRing t1_j6j7zq4 wrote

I wouldn’t mind being one of those folks. But you make a good point that the old rubrics may not be capturing it.

If you want to nail down what users are observing as its comparison to human performance, practically speaking you may need to shift to diagnostics that were designed to evaluate human performance. With the added challenge of avoiding tests where the answer sheet would already be in its training data.

1