Update (August 17th, 2017): OpenAI has published a blog post with more details about the bot. Almost everything of the post below still holds true, however. OpenAI’s post is sparse on technical details as they “not ready to talk about agent internals — the team is focused on solving 5v5 first.”. See this tweetstorm by @smerity for a good analysis.
When I read today’s news about OpenAI’s DotA 2 bot beating human players at The International, an eSports tournament with a prize pool of over $24M, I was jumping with excitement. For one, I am a big eSports fan. I have never played DotA 2, but I regularly watch other eSports competitions on Twitch and even played semi-professionally when I was in high school. But more importantly, multiplayer online battle arena (MOBA) games like DotA and real-time strategy (RTS) games like Starcraft 2, are seen as being way beyond the capabilities of current Artificial Intelligence techniques. These games require long-term strategic decision making, multiplayer cooperation, and have significantly more complex state and action spaces than Chess, Go, or Atari, all of which have been “solved” by AI techniques over the past decades. DeepMind has been working on Starcraft 2 for a while and just recently released their research environment. So far no researchers have managed to make significant breakthroughs. It is thought that we are at least 1-2 years away from beating good human players at Starcraft 2.
That’s why the OpenAI news came as such a shock. How can this be true? Have there been recent breakthroughs that I wasn’t aware of? As I started looking more into what exactly the DotA 2 bot was doing, how it was trained, and what game environment it was in, I came to the conclusion that it’s an impressive achievement, but not the AI breakthrough the press would like you to believe it is. That’s what this post is about. I would like to offer a sober explanation of what’s actually new. There is a real danger of overhyping Artificial Intelligence progress, nicely captured by misleading tweets like these:
OpenAI first ever to defeat world's best players in competitive eSports. Vastly more complex than traditional board games like chess & Go.
— Elon Musk (@elonmusk) August 12, 2017
Nobody likes being regulated, but everything (cars, planes, food, drugs, etc) that's a danger to the public is regulated. AI should be too.
— Elon Musk (@elonmusk) August 12, 2017
Let me start out by saying that none of the hype or incorrect assumptions is the fault of OpenAI researchers. OpenAI has traditionally been very straightforward and explicit about the limitations of their research contributions. I am sure it will be the same in this case. OpenAI has not yet published technical details of their solution, so it is easy to jump to wrong conclusions for people not in the field.
Let’s start out by looking at how difficult the problem that the DotA 2 bot is solving actually is. How does it compare to something like AlphaGo?
- 1v1 is not comparable to 5v5. In a typical game of DotA 2, a team of 5 plays against another team of 5 players. These games require high-level strategy, team communication and coordination, and typically take around 45 minutes. 1v1 games are much more restricted. Two players basically move down a single lane and try to kill each other. It’s typically over in a few minutes. Beating an opponent in 1v1 requires mechanical skill and short-term tactics, but none of the things, like long term planning or coordination, that are challenging for current AI techniques. In fact, the number of useful actions you can take is less than in a game of Go. The effective state space (the player’s idea of what’s currently going on in the game), if represented in a smart way, should be smaller than in Go as well.
- Bots have access to more information: The OpenAI bot was built on top of the game’s bot API, giving it access to all kinds of information humans do not have access to. Even if OpenAI researchers restricted access to certain kinds of information, the bot still has access to more exact information than humans. For example, a skill may only hit an opponent within a certain range and a human player must look at the screen and estimate the current distance to the opponent. That takes practice. The bot knows the exact distance and can make an immediate decision to use the skill or not. Having access to all kinds of exact numerical information is a big advantage. In fact, during the game, one could see the bot executing skills at the maximum distance several times.
- Reaction Times: Bots can react instantly, human’s can’t. Coupled with the information advantage from above this is another big advantage. For example, once the opponent is out of range for a specific skill a bot can immediately cancel it.
- Learning to play a single specific character: There are 100 different characters with different innate abilities and strengths. The only character the bot learns to play, Shadow Fiend, generally does immediate attacks (as opposed to more complex skills lasting over a period of time) and benefits from knowing exact distances and having fast reactions times – exactly what a bot is good at.
- Hard-coded restrictions: The bot was not trained from scratch knowing nothing about the game. Item choices were hardcoded, and so were certain techniques, such as creep block, that were deemed necessary to win. It seems like what was learned is mostly the interaction with the opponent.
Given that 1v1 is mostly a game of mechanical skill, it is not surprising that a bot beats human players. And given the severely restricted environment, the artificially restricted set of possible actions, and that there was little to no need for long-term planning or coordination, I come to the conclusion that this problem was actually significantly easier than beating a human champion in the game of Go. We did not make sudden progress in AI because our algorithms are so smart – it worked because our researchers are smart about setting up the problem in just the right way to work around the limitations of current techniques. The training time for the bot, said to be around 2 weeks, suggests the same. AlphaGo required several months of highly distributed large-scale training on Google’s GPU clusters. We’ve made some progress since then, but not something that reduces computational requirements by an order of magnitude.
Now, enough with the criticism. The work may be a little overhyped by the press, but there are in fact some extremely cool and surprising things about it. And clearly, a large amount of challenging engineering work and partnership building must have gone into making this happen.
- Trained entirely through self-play: The bot does not need any training data. It does not learn from human demonstrations either. It starts out completely random and keeps playing against itself. While this technique is nothing new, it is surprising (at least to me) that the bot learns techniques that human players are also known to use, as suggested by comments (here and here). I don’t know enough about the DotA 2 to judge this, but I think it’s extremely cool. There may be other techniques the bot has learned but humans are not even aware of. This is similar to what we’ve seen with AlphaGo, where human players started to learn from its unintuitive moves and adjusted their own game play. (Update: It has been confirmed that certain techniques were hardcoded, so it is unclear what exactly is learned)
- A major step for AI + eSports: Having challenging environments, such as DotA 2 and Starcraft 2, to test new AI techniques on is extremely important. If we can convince the eSports community and game publishers that we can provide value by applying AI techniques to games, we can expect a lot of support in return, and this may result in much faster AI progress.
- Partially Observable environments: While the details of how OpenAI researchers handled this with the API are unclear, a human player only sees what’s on the screen and may have a restricted set of view e.g. uphill. This means, unlike with games like Go or Chess or Atari (and more like Poker) we are in a partially observable environment – we don’t have access to full information about the current game state. Such problems are typically much harder to solve and an active area of research where progress is severely needed. That being said, it is unclear how much partial observability in a 1v1 DotA 2 match really matters – there isn’t too much to strategize about.
Above all, I’m very excited to read OpenAI’s technical report of what actually went into building this.
Thanks to @smerity for useful feedback, suggestions, and DotA knowledge.