mettle OP t1_j6ilm6q wrote on January 30, 2023 at 4:27 PM

Is there some alternative benchmark that measures factual accuracy of output?

Or is that impossible to use and create because any model would overfit that data?

Jean-Porte t1_j6imfho wrote on January 30, 2023 at 4:32 PM

LAMA, truthfulQA, MMLU, and many others

perfect, thank you!