LSTM implementation

By Sara Butler,2015-08-22 13:58
67 views 0
LSTM implementation

    LSTM implementation,


    Over a long period of time, I have been busy looking for a good tutorial LSTM network.They seem to be very complex, and before that I never use them to do anything.It doesn't help quick search on the Internet, because I find are some slides.

    Fortunately, I participated inKaggle EEG competitionVery interesting, and I think using LSTM, finally also understand how it works.This article is based onMy solution, using theAndrej Karpathythechar-rnnCode, this is also I strongly recommended to you.

    RNN myth

    I feel there is a very important thing has not been everyone fully stressed (and this is also why I can't use the RNN do I want to do the main reason).RNN and feedforward neural network is not very different.The easiest way to achieve a kind of method of RNN is like feedforward neural network using part of the input to the hidden layer, and some from the output of the hidden layer.In the network without any magical internal state.It as part of the input.

    The overall structure of the RNN with feedforward network structure is very similar

    LSTM review

    This section will cover only LSTM formally defined.There are a lot of other good blog, are described in detail how you imagine and think about these equations.

    LSTM has many kinds of transformation form, but we just explain a simple.A Cell consists of three Gate (input, forget, the output) and a unit Cell.Gate using a sigmoid activation function and the input and the cell state usually using tanh transformation.LSTM cell can use the following equation to define:


    Input transformation:

    Status update:

    Use images described similar to below:

    Because of gating mechanism, Cell can maintain information for a period of time at work, and maintain the internal gradient when the training is not affected by the interference of adverse change.Vanilla LSTM not forget gate, and no change during the update to add state of the cell (it can be seen as a Constant weight of 1 of the recursive link), commonly referred to as a Constant Error Carousel (CEC).So named because it solved in RNN when training a serious gradient disappeared and gradient explosion problem, thus making it possible to study long-term relationships.

    Build your own LSTM layer

    This tutorial code using a Torch7.If you don't understand it don't have to worry about it.I'll explain in detail, so you can use your favorite framework to achieve the same algorithm.

    The network will be nngraph. GModule module, basically said we defined a standard neural network calculation chart of nn module.We need the following layers:

    ; Nn. Identity () - pass input (used to store the input data)

    ; Nn. Dropout (p) - standard Dropout module (in the probability of 1 - p thrown away part of the

    hidden layer units)

    ; Nn. Linear (in and out) - from in to out an affine transformation

    ; Nn. Narrow (dim, start, len) - select a child in the first dim direction vector, the subscript begin

    from the start, length of len

    ; Nn. Sigmoid () - used Sigmoid intelligent elements

    ; Nn. Tanh () - using Tanh intelligent elements

    ; Nn. CMulTable tensor () and output (tensor) of the product

    ; Nn. CAddTable () - the sum of output tensor

    The input

    First, let's define the input form.In lua is similar to an array of objects, called table, the network will accept a similar to the following scale.

local inputs = {}

    table.insert(inputs, nn.Identity()()) -- network input

    table.insert(inputs, nn.Identity()()) -- c at time t-1

    table.insert(inputs, nn.Identity()()) -- h at time t-1

    local input = inputs[1]

    local prev_c = inputs[2]

    local prev_h = inputs[3]

    Identity module only copies we provide network's input to diagram.

    Calculate the gate value

    In order to speed up our implementation, we will use the LSTM layer at the same time. local i2h = nn.Linear(input_size,4 * rnn_size)(input)-- input to hidden local h2h = nn.Linear(rnn_size,4 * rnn_size)(prev_h)-- hidden to hidden local preactivations = nn.CAddTable()({i2h,h2h})-- i2h + h2h

    If you are not familiar with nngraph, you might feel strange, in the previous section we build inputs belong to nn. The Module, here how to call a graph node already.Happened, in fact, is that the second call to nn, converting Module nngraph. GModule, and parameter specifies the node in the parent node in the graph.

    Preactivations output is a vector, the vector by the input and the hidden state of a linear transformation.These are the original value, used to calculate the gate activation function and the cell output.This vector is divided into four parts, each part size of rnn_size.The first part will be used in the gates, in the second part is used to forget gate, the third part is used for out the gate, and the last one as the cell input (and therefore the index to the gate and the cell number I input for {I, rnn_size + I, 2 ? rnn_size + I, 3 ? rnn_size + I}).

    Next, we must use nonlinear, but despite all the gate used sigmoid, we are still using tanh to advance to activate the input.Because of this, we will use two nn. The Narrow module, it will choose the right part in the pre activation vector.

    -- gates

    local pre_sigmoid_chunk = nn . Narrow ( 2 , 1 , 3 * rnn_size )( preactivations )

    local all_gates = nn . Sigmoid ()( pre_sigmoid_chunk )

    -- input

    local in_chunk = nn . Narrow ( 2 , 3 * rnn_size + 1 , rnn_size )( preactivations )

    Local in_transform = nn. Tanh () (in_chunk) after nonlinear operation, we need to add more nn, Narrow, and then we completed the gates.

    local in_gate = nn . Narrow ( 2 , 1 , rnn_size )( all_gates )

    local forget_gate = nn . Narrow ( 2 , rnn_size + 1 , rnn_size )( all_gates )

local out_gate = nn . Narrow ( 2 , 2 * rnn_size + 1 , rnn_size )( all_gates )

    The Cell and the hidden state

    Have a good gate value calculation, then we can calculate the current state of the Cell.. All of these need is two nn CMulTable module (one for, one for), and nn. CAddTable used to add them to the current state of the cell.

    -- previous cell state contribution

    local c_forget = nn.CMulTable()({forget_gate,prev_c})

    -- input contribution

    local c_input = nn.CMulTable()({in_gate,in_transform})

    -- next cell state

    local next_c = nn.CAddTable()({



    }) in the end, it's time to realize the hidden state were calculated.This is the easiest part, because it is only the tanh is applied to the current state of the cell (nn) tanh) and on the output gate (nn) CMulTable).

local c_transform = nn . Tanh ()( next_c )

    local next_h = nn . CMulTable ()({ out_gate , c_transform })

    Definition module

    Now, if you want to export the picture as a separate module, you can use the following

    code to encapsulate it:

    -- module outputs

    outputs = {}

    table.insert ( outputs , next_c )

    table.insert ( outputs , next_h )

    -- packs the graph into a convenient module with standard API (:forward(), :backward())

    return nn . gModule ( inputs , outputs )

    The instance

    LSTM layer can beHere,To obtain.You can also use it like this: th> LSTM= require 'LSTM.lua'


    th> layer= LSTM.create(3, 2)


    th> layer:forward({torch.randn(1,3), torch.randn(1,2), torch.randn(1,2)}) {

    1 : DoubleTensor - size: 1x2

    2 : DoubleTensor - size: 1x2}



    In order to make a multilayer LSTM network, you can request the subsequent layer in

    the for loop, using a layer of next_h as input of the next layer.You can view theThis example

Report this document

For any questions or suggestions please email