Initializing the Weights in Multilayer Network with Quadratic

By Terry Bell,2014-11-26 12:07
14 views 0
Initializing the Weights in Multilayer Network with Quadratic

40 526 U1180 neural networks

    Initializing the weights in multiplayer network with quadratic sigmoid function

    Abstract A new method of initializing the weights in back propagation networks using the quadratic threshold activation function with one hidden layer is developed. The weights can be directly obtained even without further learning. This method relies on general position assumption of the distribution of the patterns. This method can be applied to many pattern recognition tasks. Finally, simulation results are presented, showing that this initializing technique generally results in drastic reduction in training time and the number of hidden neurons.

1 Introduction

    Using neural networks for pattern recognition applications is more and more attractive. The main advantages of using neural networks are their massive parallelism, adaptability, and noise tolerance, etc (B. Widrow, and R. Winter, 1988)( J. J. Hopfield, 1982)( T. J. Sejnowski, and C. R. Rosenberg, 1986). One of the most popular neural networks is the back propagation (BP) or multilayer perceptron (MLP) neural network. The most commonly used activation function of BP is the hard \-limited threshold function and sigmoid function.

    Using the hard-limited activation function in every neuron, the upper bound of the number of hidden neurons in a single-hidden layer network required for solving a

    K?general-position two-class classification problem is , where K is the number of ?n?

    patterns, n is the input dimension. If without the general-position constraint, the upper bound of the number of the hidden neurons is K 1.

    Recently, a new quadratic threshold activation function is proposed (C. C. Chiang, 1993). By using it in each neuron, it is shown that the upper bound of the number of hidden neurons required for solving a given two-class classification problem can be reduced by one half compared with the conventional multilayer perceptrons which use the hard-limited threshold function, The results are given in table 1. Since the quadratic function is a little bit more complicated than the hard-limited threshold function, the learning is much difficult for the BP network with the quadratic function. Both the learning period and convergence properties are not

    526 U1180 neural networks 41

    good enough to obtain effective results and can be observed in typical simulations. To relieve this learning difficulty, a new method for initializing weights in BP networks using the quadratic threshold activation function with one hidden layer is presented. The method based on the use of Gauss elimination; it is applicable to many classification tasks. The paper is organized as follows: the basic quadratic threshold function is described in section 2; this new initialization method is addressed in section 3; and finally, simulation results are shown in section 4.

     Problem General-position Not general-position Activation


    K? Hard-limited K-1 ?n?

    KK?? QTF ??2n2??

     Table 1. Number of hidden neurons required

2 Quadratic Threshold Function

    The quadratic threshold function (QTF) is defined as (C. C. Chiang, 1993)

42 526 U1180 neural networks

    2?0,ifnet?f(net,)Quadratic Threshold Function: ?2?1,ifnet~?

    In (C. C. Chiang, 1993), an upper bound on the number of hidden neurons

    n in E is derived required for implementing arbitrary dichotomy on a K-element set S

    under the constraint that S is in general position.

    nDefinition 1 A K-element set S in E is in general position if no (j+1) elements in

    S in a (j-1)-dimensional linear variety for any j where 2 ? j ? n.

    Proposition 1 (S. C. Huang, and Y. F. Huang, 1991, Proposition 1) Let S be a

    nfinite set in E and S is in general position. Then, for any J-element subset S of S, 1

    where 1 ? J ? n, there is a hyperplane, which is an (n-1)-dimensional linear

    nvariety of E and no other elements of S.

    nIn (E. B. Baum), Baum proved that if all the elements of a K-element set S in E

    K?is in general position, then a single-hidden-layer MLP with hidden neurons ?n?

    using the hard-limited threshold function can implement arbitrary dichotomies defined

    on S. In (C. C. Chiang, 1993), it is proved that a two-layered (one hidden layer) MLP

    K?with at most hidden neurons, which use the QTF, is capable of implementing ?2n?

    narbitrary dichotomies of a K-element set S in E if S is in general position.

    Since the quadratic threshold function is non-differentiable. To ease the derivation, we use the quadratic sigmoid function (QSF) as follows:

    1f(net,) Quadratic Sigmoid Function: 21exp(net)

3 Description of this Method

    nConsider a classification problem consisting in assigning K vectors of R to 2

    predetermined classes. Let the given training set H = {x, x, …, x} = {H, H} is 12K01

    partitioned into K ? K training vectors in subset H corresponding to class 0, and K ? 001

    K training vectors in set H corresponding to class 1, where K + K = K. And H = 1010

    12K012K1{p, p, …, p}, H = {q, q, …, q}. The classification can be implemented within 1

    a two-layer neural network with N + 1 input units,