The code for this post is on Github. This is part 4, the last part of the Recurrent Neural Network Tutorial. The previous parts are:
In this post we’ll learn about LSTM (Long Short Term Memory) networks and GRUs (Gated Recurrent Units). LSTMs were first proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber, and are among the most widely used models in Deep Learning for NLP today. GRUs, first used in 2014, are a simpler variant of LSTMs that share many of the same properties. Let’s start by looking at LSTMs, and then we’ll see how GRUs are different.
Continue reading “Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano”
This the third part of the Recurrent Neural Network Tutorial.
In the previous part of the tutorial we implemented a RNN from scratch, but didn’t go into detail on how Backpropagation Through Time (BPTT) algorithms calculates the gradients. In this part we’ll give a brief overview of BPTT and explain how it differs from traditional backpropagation. We will then try to understand the vanishing gradient problem, which has led to the development of LSTMs and GRUs, two of the currently most popular and powerful models used in NLP (and other areas). The vanishing gradient problem was originally discovered by Sepp Hochreiter in 1991 and has been receiving attention again recently due to the increased application of deep architectures.
Continue reading “Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients”
This the second part of the Recurrent Neural Network Tutorial. The first part is here.
Code to follow along is on Github.
In this part we will implement a full Recurrent Neural Network from scratch using Python and optimize our implementation using Theano, a library to perform operations on a GPU. The full code is available on Github. I will skip over some boilerplate code that is not essential to understanding Recurrent Neural Networks, but all of that is also on Github.
Continue reading “Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano”
Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many NLP tasks. But despite their recent popularity I’ve only found a limited number of resources that throughly explain how RNNs work, and how to implement them. That’s what this tutorial is about. It’s a multi-part series in which I’m planning to cover the following:
- Introduction to RNNs (this post)
- Implementing a RNN using Python and Theano
- Understanding the Backpropagation Through Time (BPTT) algorithm and the vanishing gradient problem
- Implementing a GRU/LSTM RNN
As part of the tutorial we will implement a recurrent neural network based language model. The applications of language models are two-fold: First, it allows us to score arbitrary sentences based on how likely they are to occur in the real world. This gives us a measure of grammatical and semantic correctness. Such models are typically used as part of Machine Translation systems. Secondly, a language model allows us to generate new text (I think that’s the much cooler application). Training a language model on Shakespeare allows us to generate Shakespeare-like text. This fun post by Andrej Karpathy demonstrates what character-level language models based on RNNs are capable of.
Continue reading “Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs”
Get the code: The full code is available as an Jupyter/iPython Notebook on Github!
In a previous blog post we build a simple Neural Network from scratch. Let’s build on top of this and speed up our code using the Theano library. With Theano we can make our code not only faster, but also more concise!
What is Theano?
Theano describes itself as a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays. The way I understand Theano is that it allows me to define graphs of computations. Under the hood Theano optimizes these computations in a variety of ways, including avoiding redundant calculations, generating optimized C code, and (optionally) using the GPU. Theano also has the capability to automatically differentiate mathematical expressions. By modeling computations as graphs it can calculate complex gradients using the chain rule. This means we no longer need to compute the gradients ourselves!
Continue reading “Speeding up your Neural Network with Theano and the GPU”
Get the code: To follow along, all the code is also available as an iPython notebook on Github.
In this post we will implement a simple 3-layer neural network from scratch. We won’t derive all the math that’s required, but I will try to give an intuitive explanation of what we are doing. I will also point to resources for you read up on the details.
Here I’m assuming that you are familiar with basic Calculus and Machine Learning concepts, e.g. you know what classification and regularization is. Ideally you also know a bit about how optimization techniques like gradient descent work. But even if you’re not familiar with any of the above this post could still turn out to be interesting ;)
But why implement a Neural Network from scratch at all? Even if you plan on using Neural Network libraries like PyBrain in the future, implementing a network from scratch at least once is an extremely valuable exercise. It helps you gain an understanding of how neural networks work, and that is essential for designing effective models.
Continue reading “Implementing a Neural Network from Scratch in Python – An Introduction”