LSTM implementation

By Sara Butler,2015-08-22 13:58
61 views 0
LSTM implementation

    LSTM implementation,


    Over a long period of time, I have been busy looking for a good tutorial LSTM network.They seem to be very complex, and before that I never use them to do anything.It doesn't help quick search on the Internet, because I find are some slides.

    Fortunately, I participated inKaggle EEG competitionVery interesting, and I think using LSTM, finally also understand how it works.This article is based onMy solution, using theAndrej Karpathythechar-rnnCode, this is also I strongly recommended to you.

    RNN myth

    I feel there is a very important thing has not been everyone fully stressed (and this is also why I can't use the RNN do I want to do the main reason).RNN and feedforward neural network is not very different.The easiest way to achieve a kind of method of RNN is like feedforward neural network using part of the input to the hidden layer, and some from the output of the hidden layer.In the network without any magical internal state.It as part of the input.

    The overall structure of the RNN with feedforward network structure is very similar

    LSTM review

    This section will cover only LSTM formally defined.There are a lot of other good blog, are described in detail how you imagine and think about these equations.

    LSTM has many kinds of transformation form, but we just explain a simple.A Cell consists of three Gate (input, forget, the output) and a unit Cell.Gate using a sigmoid activation function and the input and the cell state usually using tanh transformation.LSTM cell can use the following equation to define:


    Input transformation:

    Status update:

    Use images described similar to below:

    Because of gating mechanism, Cell can maintain information for a period of time at work, and maintain the internal gradient when the training is not affected by the interference of adverse change.Vanilla LSTM not forget gate, and no change during the update to add state of the cell (it can be seen as a Constant weight of 1 of the recursive link), commonly referred to as a Constant Error Carousel (CEC).So named because it solved in RNN when training a serious gradient disappeared and gradient explosion problem, thus making it possible to study long-term relationships.

    Build your own LSTM layer

    This tutorial code using a Torch7.If you don't understand it don't have to worry about it.I'll explain in detail, so you can use your favorite framework to achieve the same algorith