A group at DeepMind called the Open-Ended Learning Team has developed a new way to train AI systems to play games. Instead of exposing it to millions of prior games, as is done with other game playing AI systems, the group at DeepMind has given its new AI system agents a set of minimal skills that they use to achieve a simple goal (such as spotting another player in a virtual world) and then build on it. The researchers created a virtual world called XLand—a colorful virtual world that has a general video game appearance. In it, AI players, which the researchers call agents, set off to achieve a general goal, and as they do, they acquire skills that they can use to achieve other goals. The researchers then switch the game around, giving the agents a new goal but allowing them to retain the skills they have learned in prior games. The group has written a paper describing their efforts and have posted it on the arXiv preprint server.
One example of the technique involves an agent attempting to make its way to a part of its world that is too high to climb onto directly and for which there are no access points such as stairs or ramps. In bumbling around, the agent finds that it can move a flat object it finds to serve as a ramp and thus make its way up to where it needs to go. To allow their agents to learn more skills, the researchers created 700,000 scenarios or games in which the agents faced approximately 3.4 million unique tasks. By taking this approach, the agents were able to teach themselves how to play multiple games, such as tag, capture the flag and hide and seek. The researchers call their approach endlessly challenging. Another interesting aspect of XLand is that there exists a sort of overlord, an entity that keeps tabs on the agents and notes which skills they are learning and then generates new games to strengthen their skills. With this approach, the agents will keep learning as long as they are given new tasks.
In running their virtual world, the researchers found that the agents learned new skills, generally by accident, that they found useful and then built on them, leading to more advanced skills such as resorting to experimentation when running out of options, cooperating with other agents and learning how to use objects as tools. They suggest their approach is a step toward creating generally capable algorithms that learn how to play new games on their own—skills that might one day be used by autonomous robots.