Viewing a single comment thread. View all comments

Laser_Plasma t1_j4pws5y wrote

The whole "benchmark" is just a Readme? What is this nonsense

2

mrconter1 OP t1_j4q4o7t wrote

I will upload the data and accompanying website soon. What do you think about the idea?

−3

Laser_Plasma t1_j4q544u wrote

I think ideas are cheap (“benchmark of AGI-like capabilities”), and this particular execution of the idea (closing a window in a browser?) isn’t really good in any way

2

mrconter1 OP t1_j4qctlb wrote

The thing is that there are a lot of other screenshots + instructions as well. What wouldn a system that can get 100% on this benchmark not be able to do?

−3