Viewing a single comment thread. View all comments

KD_A OP t1_jegfh7i wrote

Great question! I have no idea lol.

More seriously, it depends on what you mean by "compare". CAPPr w/ powerful GPT-3+ models is likely gonna be more accurate. But you need to pay to hit OpenAI endpoints, so it's not a fair comparison IMO.

If you can't pay to hit OpenAI endpoints, then a fairer comparison would be CAPPr + GPT-2—specifically, the smallest one in HuggingFace, or whatever's closest in inference speed to something like bart-large-mnli. But then another issue which pops up is that GPT-2 was not explicitly trained on the NLI/MNLI task in the same way bart-large-mnli was. So I'd need to finetune GPT-2 (small) on MNLI to make a fairer comparison.

If I had a bunch of compute and time, I'd like to benchmark (or find benchmarks) for the following text classification approaches, varying the amount of training data if feasible, and ideally on tasks which are more realistic than SuperGLUE:

  • similarity embeddings
    • S-BERT
    • GPT-3+ (they claim their ada model is quite good)
  • sampling
  • MNLI-trained models
  • CAPPr
1