Viewing a single comment thread. View all comments

mettle OP t1_j6ilm6q wrote

Is there some alternative benchmark that measures factual accuracy of output?

Or is that impossible to use and create because any model would overfit that data?

1