2D+3D Face Recognition Using Dual-Tree Complex Wavelet

By Sally Rice,2014-09-10 21:01
21 views 0
2D+3D Face Recognition Using Dual-Tree Complex Wavelet


2D+3D Face Recognition Using Dual-Tree Complex Wavelet


    Wang Xueqiao, Ruan Qiuqi, An gaoyun, Jin Yi

    5 (Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China) Abstract: A fully automatic framework is proposed for face recognition and its superiority performance is justified by the FRGC v2 data. 2D and 3D facial representation extracted by the Dual-tree Complex Wavelet Transform (DT-CWT) is introduced to reflect the facial geometry properties in this paper. The level four high-frequency components of 2D texture image and 3D depth

    10 image are obtained respectively, and then Linear Discriminant Analysis (LDA) are used to get the feature vector. Cosine distance is developed for establishing the similarity matrixes. Finally a fusion was used into the two similarity matrixes. The verification rate at an FAR of 0.1% is 97.6% on All vs. All experiment.

    Key words: Face recognition; Dual-tree Complex Wavelet Transform; Linear Discriminant Analysis

0 Introduction 15

    Studies in 2D face recognition have gained significant development, such as PCA (Principal Component Analysis) [1], LDA (Linear Discriminant Analysis) [2], and ICA (Independent Component Analysis) [3] and so on, but still bear limitations mostly due to pose variation, illumination, make-up, and facial expression. These years, more and more researchers focus on

    20 this topic and proposed a large number of improved methods to overcome the obstacles of 3D face recognition. Beumier et al. [4] proposed two algorithms of facial surface registration. Central and lateral profiles were compared in the curvature space to achieve 3D face recognition. However, the method required a high computation cost and needed high resolution of 3D face data due to the sensitivity of curvature based features. Huttenlocher et al. [5] defined Hausdorff distance metric

    25 which have originally been proposed for 3D face recognition by Pan et al [6], Lee and Shim [7], and improved by Russ et al [8]. Chua et al [9] firstly applied Iterative Closet Point (ICP) [10] into 3D face registration, and then separated rigid parts of face from the non-rigid parts using Gaussian distribution to achieve recognition. Zhong et al [11] used 3D depth images for recognition. Gabor filters used on the 3D face images can effectively extract intrinsic discriminative information, and

    30 Learned Visual Codebook (LVC) can be constructed by learning the centers from K-means clustering of the filter response vectors. After this they by these learned centers. Finally they 3D face images can be represented by the LVC coefficients and achieved recognition using Nearest Neighbor (NN) classifier. They further developed Quadtree clustering algorithm to estimate the facial-codes which could further boost the performance for our purpose [12]. Chang et al. [13]

    35 used PCA to extract the intrinsic discriminant feature vectors on 2D intensity images and 3D depth images respectively, and then the fusion result obtained by 2D and 3D results was used to get the final performance. These years, many new 3D face recognition methods which were demonstrated on the FRGC v2 data have got good performances. Faltemier et al. [14] used 28 small parts divided by the whole 3D face images for 3D face recognition and the fusion results from

     40 independently matched regions. Yueming Wang et al. [15] extracted Gabor, LBP, and Haar

    Foundations: This work is supported by National Natural Science Foundation (No. 60973060); Specialized Research Fund for the Doctoral Program of Higher Education (No. 200800040008); Doctoral Candidate Outstanding Innovation Foundation (No. K11JB00290); the National Natural Science Foundation of China (No. 61003114); the fundamental research funds for the central universities (No. 2011JBM020); the Fundamental Research Funds for the Central Universities (No. 2011JBM022).

    Brief author introduction:Wang Xueqiao(1986-), Female, 3D face recognition. E-mail:

    - 1 -


    feature from depth image, and then the most discriminative local feature were selected optimally by boosting and trained as weak classifiers for assembling three collective strong classifiers. Mian et al. [16] used Spherical Face Representation (SFR) for 3D facail data and the SIFT descriptor for 2D data to train a rejection classifier. The remaining faces were verified using a region-based 45 matching approach which is robust to facial expression. Berretti [17] proposed an approach that took into account the graph form to reflect geometrical information for 3D facial surface and the relevant information among the neighboring points can be encoded into a compact representation. Alyuz [18] proposed expression resistant 3-D face recognition based on the regional registration. Region-based registration scheme was used to establish the relationship 50 among all the gallery samples in a single registration pass via common region models. Zhang et al. [19] found a novel local feature with the distinct characteristics of resolution invariant. They fused six different scale invariant similarity measures at the score level, which effectively overcome the influence of large facial expression variation. For both 2D and 3D faces contained important information of a person, this paper used a fusion method for face recognition.

    55 We used Dual-tree Complex Wavelet Transform (DT-CWT) for face recognition due to its attractive properties: approximate shift invariance, approximate rotation invariance, orientation selectivity, and efficient computation. DT-CWT descriptors utilize the squared magnitude of a complex wavelet coefficient to evaluate the properties of spectral energy in the space, scale, and orientation of a particular location. Shift invariance can handle some shift depth face due to bad 60 nose detection. Rotation invariance can dispose rotation faces due to wrong ICP registration. Orientation selectivity could treat with little face expression. 2D and 3D facial representation extracted by the Dual-tree Complex Wavelet Transform (DT-CWT) is introduced to reflect the facial geometry properties in this paper. The level four high-frequency components of 2D texture image and 3D depth image are obtained respectively, and then Linear Discriminant Analysis 65 (LDA) are used to get the feature vector. Cosine distance is developed for establishing the similarity matrixes. Finally a fusion was used into the two similarity matrixes.

    The paper is organized as follows. In Section 1, the data preprocessing method are proposed. In Section 2, the DT-CWT feature is given. Experimental results of face recognition are given in Section 3, and conclusions are drawn in Section 4.

70 1 Data Preprocessing

    Because there are spikes and noise in some 3D faces, a 3*3 Gaussian filter was used to moving spikes and noise firstly. Since the texture channel and the 3D face data of FRGC database[22] are well corresponded, we use Ada-boost face detecting method [20] on 2D texture image to help 3D facial region extraction. Fig.1 is some examples of detected faces, and we called

     75 them texture images in the paper.

     Fig.1 Texture image

     1.1 Nose Detection

    80 Before we detect the nose tip, we find the central stripe firstly. The nose tip is on the central stripe, so the area which contains the nose tip is reduced. The width of the stripe is 2mm and it was used for nose detection subsequently. Our method is simpler than other central stripe finding

    - 2 -


     ' ' ' ' method [15]. Let F = { p| p= ( x, y, z),1 ?i ? N} denotes the point set of a 3D face. Firstly, we i i i i i ?i ? N ) to p( x, y, z )(1 ? i ? N ) . The transformation is as follows: map all the points p (1 ' ? x= max( x ) ?min( x ) ? xi i i i ?i i i i i ' = (1) y85 ? i i


    ? ' z = z i i?

     ' Let F represent the point set of the transformed 3D face. Fig.2(a) is an example of the transformation, and the green face is the transformed face of the red one. Then we use ICP [10] for

    ' ' the registration between F and F .Then F is moved to F , and the transformation matrix M is

    recorded. The result is shown in Fig.2(b). Because 3 points can build a plane, we choose 3 points

    ' 90 a, b, c in F and find the corresponding pointsa ', b ', c ' in F . After this, we calculate the 3 points

    a '', b '', c '' in the plane which can detach the face into two parts.a '', b '', c '' are calculated as function (2):

     a + a ' b + b ' c + c ' (2) a '' = , b '' = , c '' = 2 2 2

    We use the position of the 3 points to establish the plane a ''( x, y, z), b ''( x, y, z), c ''( x, y, 1 1 1 2 2 2 3 3

    z) 3

    95 function, which is presented in function (3).

     x y z 1

     xy z 11 11 (3) = 0 xy z12 22

    xyz1 3 33 After this, we find a stripe which is the intersection of the plane and 3D face. In Fig.2(c), we show

     the stripe as yellow. The stripe is 2mm wide. We use the stripe, which can avoid local minimum when align two stripes using ICP [10] method, to find nose tip in the following step. 100 (a) (b) (c) Fig.2 Finding central stripe of a 3D face.

    We use the first person as standard face, and manually find its nose tip on its stripe. When we 105

     find other persons nose tip, we use an automatic algorithm. Let us suppose that A is the first faces central stripe, and B is the central stripe of the face of which we want to find the nose tip. First of all, we align stripe A to stripe B using ICP, and use transformation matrix M to find point p which is first persons transformed nose-tip point. And then we calculate a sphere C (radius of

    37mm) centered at the point p. Finally, the highest point in the sphere is found as the nose tip. The 110

    whole step is shown in Fig.3.

     (a) (b) (c) (d) Fig.3 The step of finding nose-tip from the central stripe.

    - 3 -


115 1.2 Face cropping and depth image After all the nose-tips are detected, we segment a facial region by a sphere radius of 100mm centered at the nose tip. Then, Iterative Closet Point (ICP) [10] is used to registration. The

     registration process is very important, because it can adjust the large pose of 3D face. In addition, since the mouth area is sensitive to the variations of the expressions (such as laugh, angry), we use

    a curvature flow smoothing method [23] to smooth the mouth region, and the number of the 120

     smoothing iterations is 100. The curvature flow smoothing method can effectively reduce the effect of the surface noise and make the facial surface smoother. Finally, the depth images are

     × constructed using the face region. The size of the depth image is 128 128. Some examples are presented in Fig.4. 125 Fig.4 Depth image and texture image 2 Face Recognition Using Dual-Tree Complex Wavelet Features

     2.1 Dual-tree complex wavelet transform

    DT-CWT [21] is proposed for face recognition due to its attractive properties: approximate 130

     shift invariance, approximate rotation invariance, orientation selectivity, and efficient computation [24]. It is effective and efficient to extract the geometrical property embedded in the depth image,

     which is examined by our experiments in Section 3. The 2-D DT-CWT [21,24] is characterized by six wavelets, and they give six bandpass subbands of complex coefficients at each level, which are strongly oriented at angles of -75?,-45 135

    1 ?,-15?, 15?,45?,75?, respectively. The six wavelets are: (1)ψ ( x, y)= φ ( x)ψ ( y) , (2)

    2 3 4 ψ ( x, y)= ψ ( x)ψ ( y) , (3)ψ ( x, y)= ψ ( x)φ ( y) , (4)ψ ( x, y)= ψ ( x)φ ( y) , (5)

    5 6 ψ ( x, y)= ψ ( x)ψ ( y) , (6)ψ ( x, y)= φ ( x)ψ ( y) .


     Fig.5 DT-CWT in spatial domain and their support of the spectrum in 2-D frequency plane, oriented at-75?,-45 ?,-15?, 15?,45?,75?, respectively [21,23]. 2.2 Face recognition using dual-tree complex wavelet transform In the training section, we use all the faces in FRGC v1 for training. The size of each depth

    image and texture image is 128 × 128, while the level four magnitude subimage is 8 × 8. We145 concatenate the six magnitude subimages of the depth image and texture image into a large vector respectively. Then we use LDA [2] to establish two LDA subspaces.

     In the testing section, we get the level four magnitude subimages of each tested face which contained depth image and texture image, and then vectorize them into a large vector respectively. Then we use the LDA transformation matrixes which is trained in the training section to get the 150

    - 4 -


    features of the face (each face has two features which are depth image feature and texture image

    feature). Cosine distance is used to establish two similarity matrix . S

     155 Fig.6 The flow of our method.

     3 Results and analysis

     We did 5 experiments contained Neutral vs. Neutral experiment, Neutral vs. Non-Neutral experiment, All vs. All experiment, and ROC III experiment. ROC curve of the first four experiments are presented in Fig.7. We did a comparison with texture image and depth image, and

     from the figures, we found that the fusion method gained the best performance. 160 ROC ROC 1 1 0.99 0.99 0.98 0.98 0.97 0.97 te 0.96 RaRate on0.95 0.96 tica ifi 0.94 er0.95 Verification 0.93 0.94 0.92 Texture Tex ture 0.93 Depth Depth 0.91 Fusion Fusion 0.920.9 -2 -1-30 -3 -2 -1 010 10 10 10 10 10 10 10 False Accept Rate(log scale) False Accept Rate(log scale)