DOCX

# LSTM implementation

By Sara Butler,2015-08-22 13:58
50 views 0
LSTM implementation

LSTM implementation,

preface

Over a long period of time, I have been busy looking for a good tutorial LSTM network.They seem to be very complex, and before that I never use them to do anything.It doesn't help quick search on the Internet, because I find are some slides.

Fortunately, I participated inKaggle EEG competitionVery interesting, and I think using LSTM, finally also understand how it works.This article is based onMy solution, using theAndrej Karpathythechar-rnnCode, this is also I strongly recommended to you.

RNN myth

I feel there is a very important thing has not been everyone fully stressed (and this is also why I can't use the RNN do I want to do the main reason).RNN and feedforward neural network is not very different.The easiest way to achieve a kind of method of RNN is like feedforward neural network using part of the input to the hidden layer, and some from the output of the hidden layer.In the network without any magical internal state.It as part of the input.

The overall structure of the RNN with feedforward network structure is very similar

LSTM review

This section will cover only LSTM formally defined.There are a lot of other good blog, are described in detail how you imagine and think about these equations.

LSTM has many kinds of transformation form, but we just explain a simple.A Cell consists of three Gate (input, forget, the output) and a unit Cell.Gate using a sigmoid activation function and the input and the cell state usually using tanh transformation.LSTM cell can use the following equation to define:

Gates

Input transformation:

Status update:

Use images described similar to below:

Because of gating mechanism, Cell can maintain information for a period of time at work, and maintain the internal gradient when the training is not affected by the interference of adverse change.Vanilla LSTM not forget gate, and no change during the update to add state of the cell (it can be seen as a Constant weight of 1 of the recursive link), commonly referred to as a Constant Error Carousel (CEC).So named because it solved in RNN when training a serious gradient disappeared and gradient explosion problem, thus making it possible to study long-term relationships.

This tutorial code using a Torch7.If you don't understand it don't have to worry about it.I'll explain in detail, so you can use your favorite framework to achieve the same algorithm.

The network will be nngraph. GModule module, basically said we defined a standard neural network calculation chart of nn module.We need the following layers:

; Nn. Identity () - pass input (used to store the input data)

; Nn. Dropout (p) - standard Dropout module (in the probability of 1 - p thrown away part of the

hidden layer units)

; Nn. Linear (in and out) - from in to out an affine transformation

; Nn. Narrow (dim, start, len) - select a child in the first dim direction vector, the subscript begin

from the start, length of len

; Nn. Sigmoid () - used Sigmoid intelligent elements

; Nn. Tanh () - using Tanh intelligent elements

; Nn. CMulTable tensor () and output (tensor) of the product

; Nn. CAddTable () - the sum of output tensor

The input

First, let's define the input form.In lua is similar to an array of objects, called table, the network will accept a similar to the following scale.

local inputs = {}

table.insert(inputs, nn.Identity()()) -- network input

table.insert(inputs, nn.Identity()()) -- c at time t-1

table.insert(inputs, nn.Identity()()) -- h at time t-1

local input = inputs[1]

local prev_c = inputs[2]

local prev_h = inputs[3]

Identity module only copies we provide network's input to diagram.

Calculate the gate value

In order to speed up our implementation, we will use the LSTM layer at the same time. local i2h = nn.Linear(input_size,4 * rnn_size)(input)-- input to hidden local h2h = nn.Linear(rnn_size,4 * rnn_size)(prev_h)-- hidden to hidden local preactivations = nn.CAddTable()({i2h,h2h})-- i2h + h2h

If you are not familiar with nngraph, you might feel strange, in the previous section we build inputs belong to nn. The Module, here how to call a graph node already.Happened, in fact, is that the second call to nn, converting Module nngraph. GModule, and parameter specifies the node in the parent node in the graph.

Preactivations output is a vector, the vector by the input and the hidden state of a linear transformation.These are the original value, used to calculate the gate activation function and the cell output.This vector is divided into four parts, each part size of rnn_size.The first part will be used in the gates, in the second part is used to forget gate, the third part is used for out the gate, and the last one as the cell input (and therefore the index to the gate and the cell number I input for {I, rnn_size + I, 2 ? rnn_size + I, 3 ? rnn_size + I}).

Next, we must use nonlinear, but despite all the gate used sigmoid, we are still using tanh to advance to activate the input.Because of this, we will use two nn. The Narrow module, it will choose the right part in the pre activation vector.

-- gates

local pre_sigmoid_chunk = nn . Narrow ( 2 , 1 , 3 * rnn_size )( preactivations )

local all_gates = nn . Sigmoid ()( pre_sigmoid_chunk )

-- input

local in_chunk = nn . Narrow ( 2 , 3 * rnn_size + 1 , rnn_size )( preactivations )

Local in_transform = nn. Tanh () (in_chunk) after nonlinear operation, we need to add more nn, Narrow, and then we completed the gates.

local in_gate = nn . Narrow ( 2 , 1 , rnn_size )( all_gates )

local forget_gate = nn . Narrow ( 2 , rnn_size + 1 , rnn_size )( all_gates )

local out_gate = nn . Narrow ( 2 , 2 * rnn_size + 1 , rnn_size )( all_gates )

The Cell and the hidden state

Have a good gate value calculation, then we can calculate the current state of the Cell.. All of these need is two nn CMulTable module (one for, one for), and nn. CAddTable used to add them to the current state of the cell.

-- previous cell state contribution

local c_forget = nn.CMulTable()({forget_gate,prev_c})

-- input contribution

local c_input = nn.CMulTable()({in_gate,in_transform})

-- next cell state

c_forget,

c_input

}) in the end, it's time to realize the hidden state were calculated.This is the easiest part, because it is only the tanh is applied to the current state of the cell (nn) tanh) and on the output gate (nn) CMulTable).

local c_transform = nn . Tanh ()( next_c )

local next_h = nn . CMulTable ()({ out_gate , c_transform })

Definition module

Now, if you want to export the picture as a separate module, you can use the following

code to encapsulate it:

-- module outputs

outputs = {}

table.insert ( outputs , next_c )

table.insert ( outputs , next_h )

-- packs the graph into a convenient module with standard API (:forward(), :backward())

return nn . gModule ( inputs , outputs )

The instance

LSTM layer can beHere,To obtain.You can also use it like this: th> LSTM= require 'LSTM.lua'

[0.0224s]

th> layer= LSTM.create(3, 2)

[0.0019s]

th> layer:forward({torch.randn(1,3), torch.randn(1,2), torch.randn(1,2)}) {

1 : DoubleTensor - size: 1x2

2 : DoubleTensor - size: 1x2}

}

[0.0005s]

In order to make a multilayer LSTM network, you can request the subsequent layer in

the for loop, using a layer of next_h as input of the next layer.You can view theThis example

Report this document

For any questions or suggestions please email
cust-service@docsford.com