A virtual robot arm learned to solve a wide range of different puzzles– stack blocks, set the table, arrange the chess pieces – without having to retrain for each task. He did this by playing against a second robot arm that was trained to give him increasingly difficult challenges.
Independent play: Developed by researchers from OpenAI, identical robot arms – Alice and Bob – learn by playing against each other in a simulation, without human intervention. Robots use reinforcement learning, a technique in which AIs are trained by trial and error about what actions to take in different situations to achieve certain goals. The game consists of moving objects on a virtual table. By organizing the objects in a specific way, Alice attempts to create puzzles that are difficult for Bob to solve. Bob tries to solve Alice’s puzzles. As they learn, Alice poses more complex puzzles and Bob improves upon solving them.
Multitasking: Deep learning models generally need to be retrained between tasks. For example, AlphaZero (which also learns by playing against itself) uses a single algorithm to learn to play chess, shogi, and go, but only one game at a time. The AlphaZero that plays chess cannot play Go and the one that plays Go cannot play shogi. Building truly multitasking machines is a big unresolved issue on the road to a more general AI.
Dojo AI: One of the problems is that training an AI to multitasking requires a lot of examples. OpenAI avoids this by training Alice to generate the examples for Bob, using one AI to train another. Alice learned to set goals for herself like building a tower of blocks, then picking it up and balancing it. Bob learned to use properties of the (virtual) environment, such as friction, to grab and rotate objects.
Virtual reality: So far, the approach has only been tested in a simulation, but researchers at OpenAI and elsewhere are improving to transfer models trained in virtual environments to physical environments. A simulation allows AIs to go through large sets of data in a short period of time, before being refined for real-world parameters.
General ambition: Researchers say their ultimate goal is to train a robot to solve any task a person might ask of it. Like GPT-3, a language model that can use language in different ways, these robot arms are part of OpenAI’s overall ambition to build multitasking AI. Using one AI to train another could be a key part of this.