Long Short-Term Memory (LSTM)

We recommend reading the RNN article before diving in to LSTMs.

Long Short Term Memory networks (LSTMs) are a special kind of recurrent neural network (RNN) capable of learning long-term dependencies in sequence data. LSTMs were created to overcome the inability to retain information for long periods of time, which is an inherent limitation in RNNs.

The following image illustrates the structure of a vanilla RNN:

‚Äč

LSTMs have four activation functions which are used to save pertinent information to be used in later stages of training.

Source: Christopher Olah

The horizontal line running across the cells illustrates the cell state, a channel for information to flow across the entire chain. The cell state can be updated if the information is deemed pertinent (regulated by gates in each cell). With this novel refinement to RNNs, LSTMs were explicitly designed to capture, retain, and re-use key information over long sequences.