Shape-Based Hand Recognition 1121Ender Konukoğlu, Erdem Yörük, Jerôme Darbon, Bülent Sankur 1 Electrical and Electronic Engineering Department, Boğaziçi University, Bebek, İstanbul, Turkey 2 EPITA (Ecole Pour l’Informatique et les Techniques Avancées)
[konuk, yoruk, sankur]@boun.edu.tr; email@example.com
The problem of person identification based on their hand images has been addressed. The system is based on the images of the right hands of the subjects, captured by a flatbed scanner in an unconstrained pose. In a preprocessing stage of the algorithm, the silhouettes of hand images are registered to a fixed pose, which involves both rotation and translation of the hand and, separately, of the individual fingers. Two feature sets have been comparatively assessed, Hausdorff distance of the hand contours and independent component features of the hand silhouette images. Both the classification and the verification performances are found to be very satisfactory as it was shown that, at least for groups of about hundred subjects, hand-based recognition is a viable secure access control scheme.
The emerging field of biometric technology addresses the automated identification of individuals, based on their physiological and behavioral traits. The broad category of human authentication schemes, denoted as biometrics encompasses many techniques from computer vision and pattern recognition. The personal attributes used in a biometric identification system can be physiological, such as facial features, fingerprints, iris, retinal scans, hand and finger geometry; or behavioral, the traits idiosyncratic of the individual, such as voice print, gait, signature, and keystroking. Depending on the complexity or the security level of the application, one will opt to use one or more of these personal characteristics.
In this paper, we investigate the hand shape as a distinctive personal attribute for an authentication task. Despite the fact that the use of hands as biometric evidence is not very new, and that one can witness an increasing number of commercial products being deployed, the documentation in the literature is scarcer as compared to other modalities like face or voice. However, processing of
hands requires less complexity in terms of imaging conditions, for example a relatively simple sensor such as a flatbed scanner would suffice. Consequently hand-based biometry is friendlier and it is less prone to disturbances and robust to environmental conditions. In comparison, face recognition is quite sensitive to pose, facial accessories, expression and lighting variations; iris or retina-based based identification requires special illumination and is much less friendly; fingerprint imaging requires good frictional skin etc. Therefore, authentication based on hand shape can be an attractive alternative due to its unobtrusiveness, low-cost and easy interface, and low data storage requirements. Note that there is increasing deployment of access control based on hand geometry . These applications range from passport control in airports to international banks, from parents’ access to child daycare centers to university student meal programs, from hospitals, prisons to nuclear power plants. Some of the interesting applications have been interactive kiosks, time and attendance control, anti-passback to prevent a cardholder from passing it to an accomplice, and collection of the transactions of a service system.
Hand-based authentication schemes in the literature are mostly based on geometrical features. For example, Sanchez-Reillo et al.  measure finger widths at different latitudes, finger and palm heights, finger deviations and the angles of the inter-finger valleys with the horizontal. The twenty-five selected features are modeled with Gaussian mixture models specific to each individual. Öden, Erçil and Büke  have used fourth degree implicit polynomial representation of the extracted finger shapes in addition to such geometric features as finger widths at various positions and the palm size. The resulting sixteen features are compared using the Mahalanobis distance. Jain, Ross and Pankanti  have used a peg-based imaging scheme and obtained sixteen features, which include length and width of the fingers, aspect ratio of the palm to fingers, and thickness of the hand. The prototype system they developed was tested in a verification experiment for web access over for a group of 10 people. Bulatov et al.  extract geometric features similar to [21, 20, 22] and compare two classifiers.
The method of Jain and Duta  is somewhat similar to ours in that they compare the contour shape difference via the mean square error, and it involves fingers alignment. Lay  introduced a technique where the hand is illuminated with a parallel grating that serves both to segment the background and enables the user to register his hand with one the stored contours.
Finally let’s note The geometric features of the hand shape are captured by the quadtree code. that there exist a number of patents on hand information-based personnel identification, based on either geometrical features or on hand profile .
In our paper we employ a hand shape-based approach for person identification and/or verification. The algorithm is based on preprocessing the acquired image, which involves segmentation and normalization for hand’s deformable shape. In this context ―hand normalization‖ signifies the
registration of fingers and of the hand to standard positions by separate rotations of the fingers as well rotation and translation of the whole hand. Subsequently person identification is based on the comparison of the hand silhouette shapes using Hausdorff distance or on the distance of feature vectors, namely the independent component analysis (ICA) features. The features used and the data
sizes in different algorithms are summarized in Table 1:
Table I: Characteristics and population sizes of the hand-based recognition algorithms.
Algorithm Features & Classification Number of Images per
Oden et al.  16 features: geometric features and implicit 35 10
polynomial invariants of fingers. Classifier based
on Mahalanobis distance.
Sanchez-Reillo 25 geometric features including finger and palm 20 10
et al.  thickness. Classifier based on Gaussian mixture
Duta-Jain  Hand contour data. Classifier based on mean 53 variable (from 2
average distance of contours. to 15)
Ross  17 geometric features including length, height and 50 variable (7 on
thickness of fingers and palm. Classifier based on average)
Euclidean and Mahalanobis distances.
Bulatov et al. 30 geometric features including length and height 70 10
 of fingers and palm. Classifier based on
Chebyshev metric between feature vectors.
stOur methods 1 method: Features consist of hand contour data. 118 3
Classifier based on modified Hausdorff distance.
nd2 method: Features consist of independent
components of the hand silhouette. Classifier is
the Euclidean distance.
We assume that the user of this system will be cooperating, as he/she would be demanding for
access. In other words, the user would have no interest in invalidating the access mechanism by moving or jittering his/her hand or by having fingers crumpled or sticking to each other. On the other hand, the implementation does not assume or force the user to any particular orientation. The orientation information of the hand/fingers is automatically recovered from the scanned image and then the hand normalized.
The paper is organized as follows. In Section 2, the segmentation of hand images from its background is presented. The normalization steps for the deformable hand images are given in Section 3. Section 4 details the computation of features from the normalized hand silhouettes. The experimental setup and the classification results are discussed in Section 5 and conclusions are drawn in Section 6.
2. HAND SEGMENTATION
The hand segmentation aims to extract the hand region from the background. At first sight, segmentation of a two-object scene, consisting of a hand and the background, seems a relatively easy task. However, segmentation accuracy may suffer from artifacts due to rings, overlapping cuffs or
wristwatch belts/chains, or creases around the borders from too light or heavy pressing. Furthermore, the delineation of the hand contour must be very accurate, since the differences between hands of different individuals are often minute. We have comparatively evaluated two alternate methods of segmentation, namely, clustering followed by morphological operations and the watershed transform-based segmentation. Interestingly enough, the Canny edge-based segmentation with snake completion [6, 27] did not work well due to the difficulty of fitting snakes to the very sharp concavities between fingers. Snake algorithms performed adequately only if they were properly initialized at the extremities.
2.1 Segmentation Using the Watershed Transform:
The segmentation by watershed involves two steps: marker extraction and watershed transform. Marker extraction leads to one connected component inside each object of interest, while the
watershed transform propagates these markers to define the object boundaries.
Marker Extraction: In order to extract a marker for the hand, and another for the background a two-class clustering operation is used. The two largest connected components will correspond obviously to the hand and to the background. However, due to noise, dirt spots and/or ring artifacts on the hand, the class markers may be disconnected. (Fig. 3). Such artifacts can be remedied by imposing label connectivity via Markov Random Field (MRF).
hs()ls()Let denote, respectively, the image features () and their class labels (), hslss(),();：，()
sboth defined on the lattice of the image and is any element of this lattice. An initial label field ，l
of the hand and the background can be obtained directly using distances from the two class centroids, where obviously possesses only two labels, namely, hand and background. We then consider l
pairwise interactions between neighboring pixel positions, resulting in the following energy term:
where means that s and r are neighbors. D is a data term, which measures how well the ，，sr,
labeling fits the observed data (i.e., the Mahalanobis distance between the image pixel and the hs()
ccentroid, , of the class indicated by the label . V is a prior term on the labeling we are ls()ls()
interested in. We use the Ising model  for the prior, where the number of discontinuities is penalized by . In this expression refers to the Kronecker Vlslrlrls((),())((),())！？1；；
symbol and ： is a weighting term for the prior. This model penalizes the number of discontinuities. The resulting energy term becomes thus:
??11T？13?? argmin(())(())(((),())log()||hschsclslt +：；?12？；？？？；，，()()??lsslsl??22l,sst，，??
；where denotes the covariance matrix of the data for a given label field and its determinant. ；lll
In the case of gray-level features the covariance matrix in the data fitting term simplifies to the variance expression. We implemented the image segmentation both on the color features and gray-level image features, where the outcomes were very similar. Hence in the sequel, all results are obtained with the constrained minimization run over gray-level images only, although we leave the energy minimization expression above for the general vector case. We minimize this energy using a
：fast algorithm based on the graph cut method described in . The weight factor is taken as 1,
12??：though any value between produces the same effect.
The segments resulting from the above minimization can still have more than two connected components and the two largest ones are kept. Finally, both markers are eroded with a centered disc
whose radius is set to 2 for the hand marker and to 30 for the background marker. Note that the output of the MRF minimization is not yet the final segmentation since the Ising model smoothes boundaries. Exact boundaries are extracted using the watershed transform.
Watershed Segmentation: To complete the segmentation, we use the morphological gradient of `the
(！？fhh；~()()；()h~()hgray-level image h by , where and are, respectively gray-level BBBB
erosion and dilation by the structural element B. We choose B as a centered disc of radius 3 for our experiments. This gradient image can be seen as a topographic map, which in turn is modified using minima imposition , such that extracted markers constitute its sole minima while the highest crest lines separating markers are not modified. Finally, we apply the watershed transform on this image, which consists of the flooding scheme where the water starts from regional minima. An efficient algorithm to perform the watershed transform is described in .
2.2 Segmentation using clustering and morphological smoothing
Since the number of classes is known, we have also experimented with the K-means clustering algorithm, with K = 2. However, without any regularization the resulting map can end up having holes and isolated foreground blobs, as well as severed fingers due to rings. We used morphological operators to fill in the holes  in the hand region and to remove the debris, the isolated small blobs in the background.
We apply area closing/opening  and pick the largest connected components in the labeled
image and in its complement yielding, respectively, the body of the hand and the background. We first fill in the holes inside both components and we proceed with determining the hand boundary pixels. Finally we applied a ―ring artifact removal‖ algorithm (explained in Section 3.2) to correct for any straights or isthmuses caused by the presence of rings. The resulting performance was on a par with that of the watershed transform-based algorithm.
In summary, the clustering-based segmentation is simple, but necessitates post-processing for ring artifact removal, while the watershed transform-based segmentation yields hands without artifacts, but its parameters should be set carefully.
3. NORMALIZATION OF HAND CONTOURS
The normalization of hand images involves the registering of hand images, that is global rotation and translation, as well as re-orienting fingers individually along standardized directions, without causing any shape distortions. This is in fact the most critical operation for a hand-shape based biometry application. The necessity of finger re-orientation is illustrated in Fig. 1 and it was also pointed out in [10, 17]. This figure shows two images of the hand of the same person taken on two different sessions. The left figures are the results after hand registration (but not yet finger registration), while the figures on the right are the outcomes after finger registration. The registration involves two steps: i) translation to the centroid; ii) rotation toward the direction of the larger eigenvector, that is the eigenvector corresponding to the larger eigenvalue of the inertia matrix. The inertia matrix is simply the 2x2 matrix of the second-order centered moments of the binary hand pixel distances from their centroid. Obviously, unless fingers have been set to standard orientations, recognition performance will remain very poor, as the relative distance or shape discrepancy between these two superimposed images (intra-difference) can easily exceed the distance between hands belonging to different individuals (inter-difference). Notice on the left column of Fig. 1, the residual shape differences after global hand registration that involves translation of the centroid and alignment of the orientation, but before finger alignment. The steps of the algorithm are given below in Subsections 3.1 to 3.5.
Fig. 1: Two superposed contours of the hand of the same individual; a) Rigid hand
registration only. b) Finger alignment after hand registration.
3.1 Localization of Hand Extremities
Detecting and localizing the hand extremities, that is, the fingertips and the valley between the fingers is the first step for hand normalization. Since both types of extremities are characterized by their high curvature, we first experimented with curvegram of the contour, that is, the plot of the curvature of the contour at various scales along the path length parameter. The nine maxima in the curvegram, which
were consistent across all scales, were taken as the sought after hand extremities. However we observed that this technique was rather sensitive to contour irregularities, such as spurious cavities and kinks, especially around the ill-defined wrist region.
A more robust alternative technique was provided by the plot of the radial distance with respect to a reference point around the wrist region. This reference point was taken as the first intersection point of the major axis (the larger eigenvector of the inertial matrix) with the wrist line. The resulting sequence of radial distances yields minima and maxima corresponding to the sought extremum points. The resulting extrema are very stable since the definition of the 5 maxima (fingertips) and 4 minima are not affected by the contour noise. The radial distance function and a typical hand contour with extremities marked on it are given in Fig. 2.
Fig. 2: a) Radial distance function for finger extraction; b) A hand contour with marked extremities.
3.2 Ring Artifact Removal
The presence of rings may cause separation of the finger from the palm or may create an isthmus on the finger (Fig. 3a). Firstly, an isolated finger can simply be detected by the size of its connected component since the main body of the hand should be the largest foreground component. Such a