Based on the Python convolution neural network and
Convolution neural network (ConvNets) is inspired by biological MLPs (multilayer perceptron), they have different categories of layers, and each way of working with common MLP layer also differ.If you are interested in ConvNets lies, here is a very good tutorialCS231n – Convolutional Neural Newtorks for Visual Recognition.The system structure of CNNs as shown below:
Conventional neural network (from CS231n website)
ConvNet network architecture (from CS231n website)
As you can see, ConvNets lies at work with 3 d convolution and changing these 3 d convolution.In this article I will not repeat the entire CS231n tutorial, so if you really interested in, please continue to read before you take the time to learn.
Lasagne and nolearn
The depth of the Lasagne and nolearn is my favorite use learning Python packages.Lasagne is based on the Theano, so the GPU acceleration will be different, and the method of neural network to create a statement is also very helpful.Nolearn library is a set of neural network software package utilities (including Lasagne), it is on the creation of neural network architecture, each layer of inspection can be of great help to us.
In this article I will show that how to use some convolution and pooling layer to establish a simple ConvNet architecture.I'll also show you how to use a ConvNet to train a feature extractor, in the use of such as SVM, Logistic regression, and so on the different model before using it for feature extraction.Most people are using the training ConvNet model, and then delete the last output layer, and then from ImageNets ConvNets lies network to extract characteristics of training data sets.This is commonly referred to as the migration study, because you can use for the problem of different from other layers of ConvNets lies, because ConvNets lies the first layer of the filter are seen as an edge detector, so they can be used as a common feature of the other problems detector.
Load the MNIST dataset
MNIST dataset is one of the most traditional data sets used in digit recognition.One aspect we are using the Python version, but first let's import the need to use the package: import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from urllib import urlretrieve
import cPickle as pickle
import numpy as np
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from nolearn.lasagne import visualize
from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix
As you can see, we imported the matplotlib package for drawing, some native Python
module for download MNIST dataset, numpy, theano, lasagne, nolearn and scikit - learn
evaluation model is used in some function in the library.
Then, we define a load function MNIST dataset (this function and Lasagne tutorial on
using very similar)
url = 'http://deeplearning.net/data/mnist/mnist.pkl.gz'
filename = 'mnist.pkl.gz'
if not os.path.exists(filename):
print("Downloading MNIST dataset...")
with gzip.open(filename, 'rb') as f:
data = pickle.load(f)
X_train, y_train = data
X_val, y_val = data
X_test, y_test = data
X_train = X_train.reshape((-1, 1, 28, 28))
X_val = X_val.reshape((-1, 1, 28, 28))
X_test = X_test.reshape((-1, 1, 28, 28))
y_train = y_train.astype(np.uint8)
y_val = y_val.astype(np.uint8)
y_test = y_test.astype(np.uint8)
return X_train, y_train, X_val, y_val, X_test, y_test
As you can see, we are downloading the processed MNIST dataset, and then put it into three different data sets, respectively is: the training set, validation set and test set.Then reset the image content, in preparation for the Lasagne after input layer, at the same time, as a result of the limitation of GPU/theano data types, we have also converts the data type of numpy into uint8.
Then, we are ready to load MNIST data set and test it:
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
This code will output the image below (I am using IPython Notebook)
A digital MNIST dataset instance (the instance is 5)
ConvNet architecture and training
Now, define our ConvNet architecture, single GPU/CPU is then used to train it (I have a very cheap GPU, but it is very useful)
net1 = NeuralNet(
# input layer
input_shape=(None, 1, 28, 28),
# layer conv2d1
# layer maxpool1
# layer conv2d2
# layer maxpool2
# optimization method params
# Train the network
nn = net1.fit(X_train, y_train)
As you can see, in the layers of the parameter, we define a name/type of tuples dictionary, and then defines the parameters of these.Here, our architecture is used two convolution layer, two pooling layer, a full connection layer (dense layer, dense layer) and an output layer.Between some layers have dropout layer, layer dropout is a regularization matrix and random set input values to zero to avoid over fitting (see chart).
Dropout layer effect (from CS231n website)
After call training method, nolearn package will display the state of the learning process,
my machine is using a low GPU, the result is as follows:
# Neural Network with 160362 learnable parameters ## Layer information
# name size
--- -------- --------
0 input 1x28x28
1 conv2d1 32x24x24
2 maxpool1 32x12x12
3 conv2d2 32x8x8
4 maxpool2 32x4x4
5 dropout1 32x4x4
6 dense 256
7 dropout2 256
8 output 10
epoch train loss valid loss train/val valid acc dur ------- ------------ ------------ ----------- --------- ---
1 0.85204 0.16707 5.09977 0.95174 33.71s
2 0.27571 0.10732 2.56896 0.96825 33.34s
3 0.20262 0.08567 2.36524 0.97488 33.51s
4 0.16551 0.07695 2.15081 0.97705 33.50s
5 0.14173 0.06803 2.08322 0.98061 34.38s
6 0.12519 0.06067 2.06352 0.98239 34.02s
7 0.11077 0.05532 2.00254 0.98427 33.78s
8 0.10497 0.05771 1.81898 0.98248 34.17s
9 0.09881 0.05159 1.91509 0.98407 33.80s
10 0.09264 0.04958 1.86864 0.98526 33.40s
As you can see, the last time precision can reach 0.98526, is one of the 10 unit training
fairly good performance.
Prediction and confusion matrix
Now, we use this model to predict the whole test set: preds = net1.predict(X_test)
We can also create a confusion matrix to check of the neural network classification
cm = confusion_matrix(y_test, preds)
The above code will draw the confusion matrix -
As you can see, the diagonal line on the classification of the more densely populated, shows that our classifier has a good performance.
The visualization of the filters
We can also from the first convolution in the visualization of 32 filter: visualize.plot_conv_weights(net1.layers_['conv2d1'])
The above code will draw the following filters:
The first layer of 5 x5x32 filter
As you can see, nolearn plot_conv_weights function in our specified layer produced all the filters.
Theano layer function and feature extraction
The compiler can now create theano function, it will feedforward input data to the system structure, and even a layer of interest to you.Then, I will get in front of the function of the output layer and output layer of dense layer function.
dense_layer = layers.get_output(net1.layers_['dense'], deterministic=True)
output_layer = layers.get_output(net1.layers_['output'], deterministic=True)
input_var = net1.layers_['input'].input_var
f_output = theano.function([input_var], output_layer)
f_dense = theano.function([input_var], dense_layer)
As you can see, we now have two theano function, respectively is f_output and f_dense (used in the output layer and dense layer).Please note that in order to get these here layer, we used an extra parameter called "deterministic", this is to avoid affect our dropout layer feedforward operation.
Now, we can convert instances to input format, and then input to the theano function in the output layer: