Spire_Citron t1_j9j9n46 wrote on February 22, 2023 at 11:01 AM

Fine-tuned towards taking these sorts of tests, or just more optimised in general?

duboispourlhiver t1_j9ji6qe wrote on February 22, 2023 at 12:37 PM

Yes, the risk is to be over fitted for this test. I've read that too about that paper but haven't taken the time to make my own opinion. I think it's impossible to judge if this benchmark is telling or not about the model's quality without studying this for hours

Spire_Citron t1_j9lcgc9 wrote on February 22, 2023 at 8:29 PM

If it was specifically taught to do this test, it is much less impressive because it probably means it won't have that level of intuition and understanding with other tasks.

monsieurpooh t1_j9ni0aa wrote on February 23, 2023 at 5:59 AM

I'm curious how the authors made sure to prevent overfitting. I guess there's always the risk they did, which is why they have those AI competitions where they completely withhold questions from the public until the test is run. Curious to see its performance in those

Borrowedshorts t1_j9kcmhm wrote on February 22, 2023 at 4:50 PM

Humans finetune to the test as well.

dwarfarchist9001 t1_j9kpzs8 wrote on February 22, 2023 at 6:12 PM

Humans don't suffer from overfitting if they train on the same data too much.

skob17 t1_j9kt3k1 wrote on February 22, 2023 at 6:31 PM

Oh they absolutely do. If the test questions have a slightly different approach, many of the hard memory learning students fail.

Borrowedshorts t1_j9ldhl5 wrote on February 22, 2023 at 8:35 PM

Yes they actually do.

pinkballodestruction t1_j9m407a wrote on February 22, 2023 at 11:22 PM

I sure as hell do

What. The. ***k. [less than 1B parameter model outperforms GPT 3.5 in science multiple choice questions]

turnip_burrito t1_j9j8dmm wrote on February 22, 2023 at 10:45 AM