RNNs in Tensorflow, a Practical Guide and Undocumented Features

In a previous tutorial series I went over some of the theory behind Recurrent Neural Networks (RNNs) and the implementation of a simple RNN from scratch. That’s a useful exercise, but in practice we use libraries like Tensorflow with high-level primitives for dealing with RNNs.

With that using an RNN should be as easy as calling a function, right? Unfortunately that’s not quite the case. In this post I want to go over some of the best practices for working with RNNs in Tensorflow, especially the functionality that isn’t well documented on the official site.

The post comes with a Github repository that contains Jupyter notebooks with minimal examples for:

Continue reading

Attention and Memory in Deep Learning and NLP

A recent trend in Deep Learning are Attention Mechanisms. In an interview, Ilya Sutskever, now the research director of OpenAI, mentioned that Attention Mechanisms are one of the most exciting advancements, and that they are here to stay. That sounds exciting. But what are Attention Mechanisms?

Attention Mechanisms in Neural Networks are (very) loosely based on the visual attention mechanism found in humans. Human visual attention is well-studied and while there exist different models, all of them essentially come down to being able to focus on a certain region of an image with “high resolution” while perceiving the surrounding image in “low resolution”, and then adjusting the focal point over time.

Continue reading

Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano

The code for this post is on Github. This is part 4, the last part of the Recurrent Neural Network Tutorial. The previous parts are:

In this post we’ll learn about LSTM (Long Short Term Memory) networks and GRUs (Gated Recurrent Units).  LSTMs were first proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber, and are among the most widely used models in Deep Learning for NLP today. GRUs, first used in  2014, are a simpler variant of LSTMs that share many of the same properties.  Let’s start by looking at LSTMs, and then we’ll see how GRUs are different.

Continue reading

Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients

This the third part of the Recurrent Neural Network Tutorial.

In the previous part of the tutorial we implemented a RNN from scratch, but didn’t go into detail on how Backpropagation Through Time (BPTT) algorithms calculates the gradients. In this part we’ll give a brief overview of BPTT and explain how it differs from traditional backpropagation. We will then try to understand the vanishing gradient problem, which has led to the development of  LSTMs and GRUs, two of the currently most popular and powerful models used in NLP (and other areas). The vanishing gradient problem was originally discovered by Sepp Hochreiter in 1991 and has been receiving attention again recently due to the increased application of deep architectures.

Continue reading

Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano

This the second part of the Recurrent Neural Network Tutorial. The first part is here.

Code to follow along is on Github.

In this part we will implement a full Recurrent Neural Network from scratch using Python and optimize our implementation using Theano, a library to perform operations on a GPU. The full code is available on Github. I will skip over some boilerplate code that is not essential to understanding Recurrent Neural Networks, but all of that is also on Github.

Continue reading