The Minecraft video game could create the next generation of AI

The OpenAI company has built the most successful gaming robot on Minecraft. To do this, the company made him watch more than 700 hours of videos of games played by humans. It’s a powerful new technique that could be used to train machines to perform a wide range of tasks by tapping into platforms like YouTube. A vast and untapped source of data that could be used for the development of artificial intelligences (AI).

The model trained by OpenAI on Minecraft learned to perform complex sequences – performed using keyboard and mouse clicks – to complete in-game tasks like cutting down trees and crafting tools. This is the first robot capable of making so-called “diamond” tools. A task that usually takes good human players 20 minutes of high-speed clicks, or about 24,000 actions.

The results are a breakthrough for a technique known as “imitation learning” in which neural networks are trained to perform tasks by watching humans perform them. Imitation learning can be used to train AI to control robot arms, drive cars, or surf the Internet.

Follow the example of the GPT-3 model

There are a large amount of videos posted on the internet showing people performing different tasks. By exploiting this resource, the researchers hope to do for imitation learning what the GPT-3 model did for large language models. “In recent years, we’ve seen the rise of this GPT-3 paradigm that allows amazing capabilities to be obtained from large models trained on huge swaths of the internet,” says OpenAI’s Bowen Baker, one of the team members behind the new robot trained on Minecraft. “A lot of this is because we’re modeling what humans do when they’re online.”


This blockchain video game lays the foundation for a metaverse that no one could control

The problem with existing approaches to imitation learning is that video demonstrations must be labeled at every step: doing this action causes this, doing this action causes that, and so on. Annotating by hand in this way is a lot of work and therefore datasets of this type tend to be small. Bowen Baker and his colleagues wanted to find a way to transform the millions of videos available on the web into a new dataset.

The team’s approach, called Video Pre-Training (VTP), bypasses the bottleneck of imitation learning by training another neural network to automatically label videos. They first hired gamers to play Minecraft and recorded their keyboard and mouse click movements along with the game footage on their screen. The researchers thus obtained 2,000 hours of annotated gameplay in Minecraft which they used to train a model to match actions to on-screen results. For example, if a mouse button is clicked in a certain situation, the character swings his axe.

The next step was to use this model to generate action tags for 70,000 hours of untagged video from the internet, and then train the robot on this larger dataset.

“Video is a training resource with great potential,” says Peter Stone, executive director of Sony AI America, who has previously worked on imitation learning.

A technique at the origin of recent major advances in AI

Imitation learning is an alternative to reinforcement learning, in which a neural network learns to perform a task from scratch through trial and error. This technique is at the origin of most of the great advances in AI in recent years. It’s been used to train models that can beat humans at games, control a fusion reactor, and discover a faster way to do basic math.

The problem is that reinforcement learning works best for tasks that have a clear goal and where random actions can lead to accidental success. Reinforcement learning algorithms reward these accidental successes to make them more likely to happen again.

Minecraft is an aimless game. Players are free to do whatever they want, wandering around a computer-generated world, mining different materials and combining them to craft different items.

Minecraft is becoming an important test bed for new AI techniques

The open nature of Minecraft provides a good environment for training artificial intelligence. Bowen Baker is one of the researchers behind Hide & Seek, a project in which robots were dropped into a virtual playground and used reinforcement learning to understand how to cooperate and use tools to win at simple games. However, the robots soon outgrew their surroundings. “They kind of took over the universe. They had nothing else to do,” says Bowen Baker. “We wanted to extend the latter and we thought that Minecraft was an ideal field of work”.

They are not the only ones. Minecraft is becoming an important test bed for new AI techniques. MineDojo, a Minecraft environment with dozens of pre-made challenges, won an award at this year’s NeurIPS conference, one of the biggest AI conferences.

Using VPT, OpenAI’s robot was able to complete tasks that would have been impossible using reinforcement learning alone such as making boards and turning them into a table, which involves around 970 consecutive actions. Despite this, the team found that the best results came from using imitation learning and reinforcement learning together. By taking a robot trained with VPT and honing its capabilities with reinforcement learning, it was able to perform tasks involving more than 20,000 consecutive actions.


Meta has developed an AI capable of imitating human behavior to win at a strategy game

Can VPT help train robots to perform physical tasks in the real world?

The researchers say their approach could be used to train the AI ​​to perform other tasks. Initially, it could be used to train robots using a keyboard and mouse to browse websites, book flights or buy food online. In theory, it could also be used to train robots to perform physical tasks in the real world by watching videos and copying the movements of people in those videos. “It’s plausible,” enthuses Peter Stone.

However, Matthew Guzdia of the University of Alberta in Canada, who has used videos to teach AIs the rules of games like Super Mario Bros, doesn’t think that will happen anytime soon. Actions in games like Minecraft and Super Mario Bros are performed by pressing buttons. Actions in the physical world are much more complicated and complex for a machine to learn. “This opens up a whole host of new research problems,” he says.

“This work is further proof of the power of scaling models and training on large datasets to achieve good performance,” says Natasha Jacques. The latter works on multi-agent reinforcement learning at Google and at the University of California at Berkeley.

These large, internet-scale datasets will most certainly unlock new capabilities for AI, she says: “We’ve seen it time and time again and it’s a great approach.” However, OpenAI places a lot of hope in the power of large datasets alone, she continues. “Personally, I’m a bit more skeptical that data can solve any problem.”

Still, Bowen Baker and his colleagues think collecting over a million hours of Minecraft footage will make their AI even better. It’s probably the best performing robot on Minecraft to date, he insists: “With more data and larger models available, I would expect you to feel like watching a human play the game as opposed to a baby AI trying to imitate a human.”

Article by Will Douglas Heaven, translated from English by Kozi Pastakia.


Meta’s new AI can generate videos from simple text

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *