I haven't vetted this yet, but it looks pretty well done from my first glance. It compares multiple models against multiple tasks, so you can hone in on your specific needs.
I think huggingface has something similar, but I haven't found all the info in a single page that's easy to compare. You kind of have to bounce around between various model cards, tasks, and metrics pages to find similar info.
SquareRootsi t1_isjsnk6 wrote
Reply to comment by EducationalCicada in [R] UL2: Unifying Language Learning Paradigms - Google Research 2022 - 20B parameters outperforming 175B GTP-3 and tripling the performance of T5-XXl on one-shot summarization. Public checkpoints! by Singularian2501
I haven't vetted this yet, but it looks pretty well done from my first glance. It compares multiple models against multiple tasks, so you can hone in on your specific needs.
https://gem-benchmark.com/results
I think huggingface has something similar, but I haven't found all the info in a single page that's easy to compare. You kind of have to bounce around between various model cards, tasks, and metrics pages to find similar info.