Viewing a single comment thread. View all comments

Soupjoe5 OP t1_ixulokm wrote

2

“Video is a training resource with a lot of potential,” says Peter Stone, executive director of Sony AI America, who has previously worked on imitation learning.

Imitation learning is an alternative to reinforcement learning, in which a neural network learns to perform a task from scratch via trial and error. This is the technique behind many of the biggest AI breakthroughs in the last few years. It has been used to train models that can beat humans at games, control a fusion reactor, and discover a faster way to do fundamental math.

The problem is that reinforcement learning works best for tasks that have a clear goal, where random actions can lead to accidental success. Reinforcement learning algorithms reward those accidental successes to make them more likely to happen again.

But Minecraft is a game with no clear goal. Players are free to do what they like, wandering a computer-generated world, mining different materials and combining them to make different objects.

Minecraft’s open-endedness makes it a good environment for training AI. Baker was one of the researchers behind Hide & Seek, a project in which bots were let loose in a virtual playground where they used reinforcement learning to figure out how to cooperate and use tools to win simple games. But the bots soon outgrew their surroundings. “The agents kind of took over the universe, there was nothing else for them to do” says Baker. “We wanted to expand it and we thought Minecraft was a great domain to work in.”

They’re not alone. Minecraft is becoming an important testbed for new AI techniques. MineDojo, a Minecraft environment with dozens of predesigned challenges, won an award at this year’s NeurIPS, one of the biggest AI conferences.

Using VPT, OpenAI’s bot was able to carry out tasks that would have been impossible using reinforcement learning alone, such as crafting planks and turning them into a table, which involves around 970 consecutive actions. Even so, they found that the best results came from using imitation learning and reinforcement learning together. Taking a bot trained with VPT and fine-tuning it with reinforcement learning allowed it to carry out tasks involving more than 20,000 consecutive actions.

10

Soupjoe5 OP t1_ixulp6k wrote

3

The researchers claim that their approach could be used to train AI to carry out other tasks. To begin with, it could be used to for bots that use a keyboard and mouse to navigate websites, book flights or buy groceries online. But in theory it could be used to train robots to carry out physical, real-world tasks by copying first-person video of people doing those things. “It’s plausible,” says Stone.

Matthew Gudzial at the University of Alberta, Canada, who has used videos to teach AI the rules of games like Super Mario Bros, does not think it will happen any time soon, however. Actions in games like Minecraft and Super Mario Bros. are performed by pressing buttons. Actions in the physical world are far more complicated and harder for a machine to learn. "It unlocks a whole mess of new research problems," says Gudzial.

“This work is another testament to the power of scaling up models and training on massive datasets to get good performance,” says Natasha Jaques, who works on multi-agent reinforcement learning at Google and the University of California, Berkeley.

Large internet-sized data sets will certainly unlock new capabilities for AI, says Jaques. “We've seen that over and over again, and it's a great approach.” But OpenAI places a lot of faith in the power of large data sets alone, she says: “Personally, I'm a little more skeptical that data can solve any problem.”

Still, Baker and his colleagues think that collecting more than a million hours of Minecraft videos will make their AI even better. It’s probably the best Minecraft-playing bot yet, says Baker: “But with more data and bigger models I would expect it to feel like you're watching a human playing the game, as opposed to a baby AI trying to mimic a human.”

12