Submitted by mettle t3_10oyllu in MachineLearning
mettle OP t1_j6ilm6q wrote
Reply to comment by Jean-Porte in [Discussion] ChatGPT and language understanding benchmarks by mettle
Is there some alternative benchmark that measures factual accuracy of output?
Or is that impossible to use and create because any model would overfit that data?
Jean-Porte t1_j6imfho wrote
LAMA, truthfulQA, MMLU, and many others
mettle OP t1_j6imy6h wrote
perfect, thank you!
Viewing a single comment thread. View all comments