Viewing a single comment thread. View all comments

Jean-Porte t1_j6hif9e wrote

T5 is fine-tuned on supervised classification. Trained to output labels. That's why it outperforms GPT3.

Generative models are not as good as discriminative models for discriminative tasks. A carefully tuned Deberta is probably better than chatGPT. But ChatGPT has a user-friendly text interface. And the glue-type evaluation is not charitable to chatGPT capabilities. The model might internally store the answer but it could be misaligned to the benchmark.

I always wonder why we don't try to scale-up discriminative models. Deberta-xxlarge is "only" 1.3B parameters, and it outperforms T5 13B.

17

mettle OP t1_j6ilm6q wrote

Is there some alternative benchmark that measures factual accuracy of output?

Or is that impossible to use and create because any model would overfit that data?

1