A.I. Research Mines 'Minecraft' to Mimic Human Learning

Archimedes said that with a long enough lever he could move the world. Computer scientists have tried a similar approach in artificial intelligence: with enough computing power, a ‘bot can brute-force its way through any problem, evaluating every possible option and then picking the best. Unfortunately, just as Archimedes never quite found a long enough lever (nor a “fulcrum on which to place it”), computer scientists know the limits of brute strength. For example, the first paper about computer chess, written in 1950 by Claude Shannon, showed that just three moves out, brute force already has to evaluate 10^9 positions. To look five moves into the future, a computer able to evaluate a million positions per second would take more than 30 years. Yikes.

That’s why humans don’t use brute force–our brains are puny little processors compared to the mighty power of silicon and electricity. Instead, both beginning and expert players evaluate only about 40-50 possible moves and then choose what they see as the best one. Only, expert players have better search “heuristics”–the strategies they use to narrow down their choices, or in the language of computer science, to “limit the search space.” Since the dawn of A.I., computer scientists have been working to mimic these human search heuristics.

Enter Minecraft.

“The whole of Minecraft is what we refer to as ‘A.I. complete.’ If you can do all of Minecraft you could solve anything,” says Stefanie Tellex, assistant professor in the Computer Science Department at Brown University, and senior author of a paper being presented right now at the 25th International Conference on Automated Planning and Scheduling in Jerusalem, Israel.

Part of what makes Minecraft such a difficult A.I. puzzle is that robots are easily distracted. In the game, if the goal is to smelt gold in a furnace, a ‘bot may have to recognize that digging through a wall is an irrelevant task. It’s like sacrificing a queen for a pawn: chess experts wouldn’t even see it as an option, but, without a similar heuristic, a computer would have to recognize, evaluate, and discard the option.

In fact, as Tellex points out, distraction is a major challenge in real-world A.I. tasks: “When making brownies, actions related to the oven and flour are important, while those involving the soy sauce and sauté pan are not. For a different task, such as stir-frying broccoli, the robot must use a different set of objects and actions,” she writes.

It’s not a problem when the only irrelevant option in a Minecraft scenario is digging through a wall, but what about when there are 10 irrelevant options, or 10^9 irrelevant options? Suddenly the “search space” explodes and the problem takes a supercomputer and a lifetime of waiting while it spins through the possibilities.

That is, unless you start small and learn. That’s the approach that Tellex and her team take with Minecraft. The team started with five tasks: bridge construction, gold smelting, tunneling through walls, digging to find an object, and path planning. For each task, they randomly generated 20 simple training scenarios, each with only 1,000-10,000 possible “states”–like a chessboard with only a few pieces on a few squares.

“It’s able to learn that if you’re standing next to a trench and you’re trying to walk across, you can place blocks in the trench. Otherwise don’t place blocks. If you’re trying to mine some gold under some blocks, destroy the blocks. Otherwise don’t destroy blocks,” Tellex said in a Brown University press release.

Once the team’s ‘bot had brute-forced its way through these simple scenarios, it was ready for the test: 20 more scenarios, but this time with 50,000-1,000,000 possible states.

How much computer processing time is required to gain rewards in the game? Well, after training it took less processing to earn more reward, or in the language of Tellex and her team, “These priors allow an agent to efficiently prune actions based on learned knowledge, significantly reducing the number of state-action pairs the agent needs to evaluate in order to act near optimally.”

Though technically and mathematically this work turns out to be incredibly complex, the gist of the experiment is computer learning in a nutshell: you train it and it gets better. The other gist of this line of work is that if you can solve Minecraft, you can solve anything. With a tweak and a twist, the group now has a ‘bot capable of learning in the full version of the Minecraft universe.

In addition to evolving toward strategies that could influence the way humans play Minecraft, the team expects that mining Minecraft will influence the way robots play in the real world. When they stuck their Minecraft learning algorithm in a ‘bot with arms, it learned to bake brownies in a space with 47,300,000 possible states. While successfully solving a Minecraft task may offer little more than a profound sense of accomplishment and personal worth, the lessons researchers take from the game lets us have brownies. And it doesn’t take a PhD in computer learning to understand the universal truth that brownies are good.

Liked it? Take a second to support GeekDad and GeekMom on Patreon!