Viewing a single comment thread. View all comments

Laser_Plasma t1_j4q544u wrote

I think ideas are cheap (“benchmark of AGI-like capabilities”), and this particular execution of the idea (closing a window in a browser?) isn’t really good in any way

2

mrconter1 OP t1_j4qctlb wrote

The thing is that there are a lot of other screenshots + instructions as well. What wouldn a system that can get 100% on this benchmark not be able to do?

−3