Hype or Not? Some Perspective on OpenAI’s DotA 2 Bot

See the Hacker News Discussion for additional context.

Update (August 17th, 2017): OpenAI has published a blog post with more details about the bot. Almost everything of the post below still holds true, however. OpenAI’s post is sparse on technical details as they “not ready to talk about agent internals — the team is focused on solving 5v5 first.”. See this tweetstorm by @smerity for a good analysis.

When I read today’s news about OpenAI’s DotA 2 bot beating human players at The International, an eSports tournament with a prize pool of over $24M, I was jumping with excitement. For one, I am a big eSports fan. I have never played DotA 2, but I regularly watch other eSports competitions on Twitch and even played semi-professionally when I was in high school. But more importantly, multiplayer online battle arena (MOBA) games like DotA and real-time strategy (RTS) games like Starcraft 2, are seen as being way beyond the capabilities of current Artificial Intelligence techniques. These games require long-term strategic decision making, multiplayer cooperation, and have significantly more complex state and action spaces than Chess, Go, or Atari, all of which have been “solved” by AI techniques over the past decades. DeepMind has been working on Starcraft 2 for a while and just recently released their research environment. So far no researchers have managed to make significant breakthroughs. It is thought that we are at least 1-2 years away from beating good human players at Starcraft 2.

That’s why the OpenAI news came as such a shock. How can this be true? Have there been recent breakthroughs that I wasn’t aware of? As I started looking more into what exactly the DotA 2 bot was doing, how it was trained, and what game environment it was in, I came to the conclusion that it’s an impressive achievement, but not the AI breakthrough the press would like you to believe it is. That’s what this post is about. I would like to offer a sober explanation of what’s actually new. There is a real danger of overhyping Artificial Intelligence progress, nicely captured by misleading tweets like these:

Let me start out by saying that none of the hype or incorrect assumptions is the fault of OpenAI researchers. OpenAI has traditionally been very straightforward and explicit about the limitations of their research contributions. I am sure it will be the same in this case. OpenAI has not yet published technical details of their solution, so it is easy to jump to wrong conclusions for people not in the field.

Let’s start out by looking at how difficult the problem that the DotA 2 bot is solving actually is. How does it compare to something like AlphaGo?

  • 1v1 is not comparable to 5v5. In a typical game of DotA 2, a team of 5 plays against another team of 5 players. These games require high-level strategy, team communication and coordination, and typically take around 45 minutes. 1v1 games are much more restricted. Two players basically move down a single lane and try to kill each other. It’s typically over in a few minutes. Beating an opponent in 1v1 requires mechanical skill and short-term tactics, but none of the things, like long term planning or coordination, that are challenging for current AI techniques. In fact, the number of useful actions you can take is less than in a game of Go. The effective state space (the player’s idea of what’s currently going on in the game), if represented in a smart way, should be smaller than in Go as well.
  • Bots have access to more information: The OpenAI bot was built on top of the game’s bot API, giving it access to all kinds of information humans do not have access to. Even if OpenAI researchers restricted access to certain kinds of information, the bot still has access to more exact information than humans. For example, a skill may only hit an opponent within a certain range and a human player must look at the screen and estimate the current distance to the opponent. That takes practice. The bot knows the exact distance and can make an immediate decision to use the skill or not. Having access to all kinds of exact numerical information is a big advantage. In fact, during the game, one could see the bot executing skills at the maximum distance several times.
  • Reaction Times: Bots can react instantly, human’s can’t. Coupled with the information advantage from above this is another big advantage. For example, once the opponent is out of range for a specific skill a bot can immediately cancel it.
  • Learning to play a single specific character: There are 100 different characters with different innate abilities and strengths. The only character the bot learns to play, Shadow Fiend, generally does immediate attacks (as opposed to more complex skills lasting over a period of time) and benefits from knowing exact distances and having fast reactions times – exactly what a bot is good at.
  • Hard-coded restrictions: The bot was not trained from scratch knowing nothing about the game. Item choices were hardcoded, and so were certain techniques, such as creep block, that were deemed necessary to win. It seems like what was learned is mostly the interaction with the opponent.

Given that 1v1 is mostly a game of mechanical skill, it is not surprising that a bot beats human players. And given the severely restricted environment, the artificially restricted set of possible actions, and that there was little to no need for long-term planning or coordination, I come to the conclusion that this problem was actually significantly easier than beating a human champion in the game of Go. We did not make sudden progress in AI because our algorithms are so smart – it worked because our researchers are smart about setting up the problem in just the right way to work around the limitations of current techniques. The training time for the bot, said to be around 2 weeks, suggests the same. AlphaGo required several months of highly distributed large-scale training on Google’s GPU clusters. We’ve made some progress since then, but not something that reduces computational requirements by an order of magnitude.

Now, enough with the criticism. The work may be a little overhyped by the press, but there are in fact some extremely cool and surprising things about it. And clearly, a large amount of challenging engineering work and partnership building must have gone into making this happen.

  • Trained entirely through self-play: The bot does not need any training data. It does not learn from human demonstrations either. It starts out completely random and keeps playing against itself. While this technique is nothing new, it is surprising (at least to me) that the bot learns techniques that human players are also known to use, as suggested by comments (here and here). I don’t know enough about the DotA 2 to judge this, but I think it’s extremely cool. There may be other techniques the bot has learned but humans are not even aware of. This is similar to what we’ve seen with AlphaGo, where human players started to learn from its unintuitive moves and adjusted their own game play. (Update: It has been confirmed that certain techniques were hardcoded, so it is unclear what exactly is learned)
  • A major step for AI + eSports: Having challenging environments, such as DotA 2 and Starcraft 2, to test new AI techniques on is extremely important. If we can convince the eSports community and game publishers that we can provide value by applying AI techniques to games, we can expect a lot of support in return, and this may result in much faster AI progress.
  • Partially Observable environments: While the details of how OpenAI researchers handled this with the API are unclear, a human player only sees what’s on the screen and may have a restricted set of view e.g. uphill. This means, unlike with games like Go or Chess or Atari (and more like Poker) we are in a partially observable environment – we don’t have access to full information about the current game state. Such problems are typically much harder to solve and an active area of research where progress is severely needed. That being said, it is unclear how much partial observability in a 1v1 DotA 2 match really matters – there isn’t too much to strategize about.

Above all, I’m very excited to read OpenAI’s technical report of what actually went into building this.

Thanks to @smerity for useful feedback, suggestions, and DotA knowledge.

Learning Reinforcement Learning (with Code, Exercises and Solutions)

Skip all the talk and go directly to the Github Repo with code and exercises.

Why Study Reinforcement Learning

Reinforcement Learning is one of the fields I’m most excited about. Over the past few years amazing results like learning to play Atari Games from raw pixels and Mastering the Game of Go have gotten a lot of attention, but RL is also widely used in Robotics, Image Processing and Natural Language Processing.

Combining Reinforcement Learning and Deep Learning techniques works extremely well. Both fields heavily influence each other. On the Reinforcement Learning side Deep Neural Networks are used as function approximators to learn good representations, e.g. to process Atari game images or to understand the board state of Go. In the other direction, RL techniques are making their way into supervised problems usually tackled by Deep Learning. For example, RL techniques are used to implement attention mechanisms in image processing, or to optimize long-term rewards in conversational interfaces and neural translation systems. Finally, as Reinforcement Learning is concerned with making optimal decisions it has some extremely interesting parallels to human Psychology and Neuroscience (and many other fields).

With lots of open problems and opportunities for fundamental research I think we’ll be seeing multiple Reinforcement Learning breakthroughs in the coming years. And what could be more fun than teaching machines to play Starcraft and Doom?

How to Study Reinforcement Learning

There are many excellent Reinforcement Learning resources out there. Two I recommend the most are:

The latter is still work in progress but it’s ~80% complete. The course is based on the book so the two work quite well together. In fact, these two cover almost everything you need to know to understand most of the recent research papers. The prerequisites are basic Math and some knowledge of Machine Learning.

That covers the theory. But what about practical resources? What about actually implementing the algorithms that are covered in the book/course? That’s where this post and the Github repository comes in. I’ve tried to implement most of the standard Reinforcement Algorithms using Python, OpenAI Gym and Tensorflow. I separated them into chapters (with brief summaries) and exercises and solutions so that you can use them to supplement the theoretical material above. All of this is in the Github repository.

Some of the more time-intensive algorithms are still work in progress, so feel free to contribute. I’ll update this post as I implement them.

Table of Contents

List of Implemented Algorithms

RNNs in Tensorflow, a Practical Guide and Undocumented Features

In a previous tutorial series I went over some of the theory behind Recurrent Neural Networks (RNNs) and the implementation of a simple RNN from scratch. That’s a useful exercise, but in practice we use libraries like Tensorflow with high-level primitives for dealing with RNNs.

With that using an RNN should be as easy as calling a function, right? Unfortunately that’s not quite the case. In this post I want to go over some of the best practices for working with RNNs in Tensorflow, especially the functionality that isn’t well documented on the official site.

The post comes with a Github repository that contains Jupyter notebooks with minimal examples for:

Continue reading “RNNs in Tensorflow, a Practical Guide and Undocumented Features”

Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow

The Code and data for this tutorial is on Github.

Retrieval-Based bots

In this post we’ll implement a retrieval-based bot. Retrieval-based models have a repository of pre-defined responses they can use, which is unlike generative models that can generate responses they’ve never seen before. A bit more formally, the input to a retrieval-based model is a context c (the conversation up to this point) and a potential response r. The model outputs is a score for the response. To find a good response you would calculate the score for multiple responses and choose the one with the highest score.

Continue reading “Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow”

Deep Learning for Chatbots, Part 1 – Introduction

Chatbots, also called Conversational Agents or Dialog Systems, are a hot topic. Microsoft is making big bets on chatbots, and so are companies like Facebook (M), Apple (Siri), Google, WeChat, and Slack. There is a new wave of startups trying to change how consumers interact with services by building consumer apps like Operator or x.ai, bot platforms like Chatfuel, and bot libraries like Howdy’s Botkit. Microsoft recently released their own bot developer framework.

Many companies are hoping to develop bots to have natural conversations indistinguishable from human ones, and many are claiming to be using NLP and Deep Learning techniques to make this possible. But with all the hype around AI it’s sometimes difficult to tell fact from fiction.

In this series I want to go over some of the Deep Learning techniques that are used to build conversational agents, starting off by explaining where we are right now, what’s possible, and what will stay nearly impossible for at least a little while. This post will serve as an introduction, and we’ll get into the implementation details in upcoming posts.

Continue reading “Deep Learning for Chatbots, Part 1 – Introduction”

Attention and Memory in Deep Learning and NLP

A recent trend in Deep Learning are Attention Mechanisms. In an interview, Ilya Sutskever, now the research director of OpenAI, mentioned that Attention Mechanisms are one of the most exciting advancements, and that they are here to stay. That sounds exciting. But what are Attention Mechanisms?

Attention Mechanisms in Neural Networks are (very) loosely based on the visual attention mechanism found in humans. Human visual attention is well-studied and while there exist different models, all of them essentially come down to being able to focus on a certain region of an image with “high resolution” while perceiving the surrounding image in “low resolution”, and then adjusting the focal point over time.

Continue reading “Attention and Memory in Deep Learning and NLP”

Implementing a CNN for Text Classification in TensorFlow

The full code is available on Github.

In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become a standard baseline for new text classification architectures.

Continue reading “Implementing a CNN for Text Classification in TensorFlow”

Understanding Convolutional Neural Networks for NLP

When we hear about Convolutional Neural Network (CNNs), we typically think of Computer Vision. CNNs were responsible for major breakthroughs in Image Classification and are the core of most Computer Vision systems today, from Facebook’s automated photo tagging to self-driving cars.

More recently we’ve also started to apply CNNs to problems in Natural Language Processing and gotten some interesting results. In this post I’ll try to summarize what CNNs are, and how they’re used in NLP. The intuitions behind CNNs are somewhat easier to understand for the Computer Vision use case, so I’ll start there, and then slowly move towards NLP.

Continue reading “Understanding Convolutional Neural Networks for NLP”

Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano

The code for this post is on Github. This is part 4, the last part of the Recurrent Neural Network Tutorial. The previous parts are:

In this post we’ll learn about LSTM (Long Short Term Memory) networks and GRUs (Gated Recurrent Units).  LSTMs were first proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber, and are among the most widely used models in Deep Learning for NLP today. GRUs, first used in  2014, are a simpler variant of LSTMs that share many of the same properties.  Let’s start by looking at LSTMs, and then we’ll see how GRUs are different.

Continue reading “Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano”

Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients

This the third part of the Recurrent Neural Network Tutorial.

In the previous part of the tutorial we implemented a RNN from scratch, but didn’t go into detail on how Backpropagation Through Time (BPTT) algorithms calculates the gradients. In this part we’ll give a brief overview of BPTT and explain how it differs from traditional backpropagation. We will then try to understand the vanishing gradient problem, which has led to the development of  LSTMs and GRUs, two of the currently most popular and powerful models used in NLP (and other areas). The vanishing gradient problem was originally discovered by Sepp Hochreiter in 1991 and has been receiving attention again recently due to the increased application of deep architectures.

Continue reading “Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients”