Viewing a single comment thread. View all comments

mrconter1 OP t1_j4tuaal wrote

> This is not testing intelligence, this is testing if human was trained on computer usage, knows what e-mail is and used gmail before.

I don't think it's binary. I think intelligence is a large part here.

> Someone from tribe in Africa would fail your test while he is human and is intelligent,

Could you train a bird to pass all questions on this benchmark? No. Because it's not as intelligent as a human.

> train him on this task like you would train current gen multimodal system and it will pass your benchmark. You train LLM in combination with image model and RL model, train on instruction following using inputs you described and now it understands what it sees, can follow what you want it to do.

Solving this benchmark is an easy problem? How long do you think it will take until we have a model that can causually solve all the instructions a gave in the previous comment?

1