Viewing a single comment thread. View all comments

currentscurrents t1_jdrt3gv wrote

I think all tests designed for humans are worthless here.

They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.


Yecuken t1_jdsm4w1 wrote

Tests would not help against optimization, models will just learn how to pass the test. Optimization will always win against any problem with a known solution