Discriminant Function Analysis with Three or More Groups

By Thomas Rose,2014-11-26 11:40
10 views 0
Discriminant Function Analysis with Three or More Groups

    Discriminant Function Analysis with Three or More Groups

     With more than two groups one can obtain more than one discriminant function. The first DF is that which maximally separates the groups (produces the largest ratio of among-groups to within groups SS on the resulting D scores). The second DF, orthogonal

    to the first, maximally separates the groups on variance not yet explained by the first DF.

    One can find a total of K-1 (number of groups minus 1) or p (number of predictor variables)

    orthogonal discriminant functions, whichever is smaller.

     We shall use the data from Experiment 1 of my dissertation to illustrate a discriminant function analysis with three groups. The analysis I reported when I published this research was a doubly multivariate repeated measures ANOVA (see Wuensch, K. L., Fostering house mice onto rats and deer mice: Effects on response to species odors. Animal Learning and Behavior, 1992, 20, 253-258). Wild-strain house mice were, at birth, cross-fostered onto house-mouse (Mus), deer mouse (Peromyscus) or rat (Rattus) nursing

    mothers. Ten days after weaning, each subject was tested in an apparatus that allowed it to enter tunnels scented with clean pine shavings or with shavings bearing the scent of Mus,

    Peromyscus, or Rattus. One of the variables measured was the number of visits to each tunnel during a twenty minute test. Also measured were how long each subject spent in each of the four tunnels and the latency to first visit of each tunnel. We shall use the visits data for our discriminant function analysis.

     The data are in the SPSS data file, TUNNEL4b.sav. Download it from my SPSS-

    Data page. The variables in this data file are:

     NURS (nursing group, 1 for Mus reared, 2 for Peromyscus reared, and 3 for Rattus


     V1, V2, V3, and V4 (labeled Clean-V, Mus-V, Pero-V, and Rat-V, these are the raw

    data for number of visits to the clean, Mus-scented, Peromyscus-scented, and

    Rattus-scented tunnels)

     V_Clean, V_Mus, V_Pero, and V_Rat (the visits data after a square root

    transformation to reduce positive skewness and stabilize the variances)

     T1, T2, T3, and T4 (time in seconds spent in each tunnel)

     T_Clean, T_Mus, T_Pero, and T_Rat (the time data after a square root

    transformation to reduce positive skewness)

     L1, L2, L3, and L4 (the latency data in seconds) and

     L_Clean, L_Mus, L_Pero, and L_Rat (the latency data after a log transformation to

    reduce positive skewness).

     For this lesson we shall use only the NURS variable and the visits variables.

Obtaining Means and Standard Deviations for the Untransformed Data

     Open the TUNNEL4b.sav file in SPSS. Click Analyze, Compare Means, Means.

     Copyright 2008 Karl L. Wuensch - All rights reserved.



     Scoot V1, V2, V3, and V4 into the Dependent List and Nurs into the Independent List. Click OK

    The output produced here is a table of means and standard deviations for untransformed number of visits to each tunnel for each nursing group. Look at the means for the Mus group and the Peromyscus group. These two groups were very similar to one

    another. Both visited the tunnels with moderate frequency, except for the rat-scented tunnel, which they avoided. Now look at the means for the Rattus-reared group. These

    animals appear to have been much more active, visiting the tunnels more frequently than did animals in the other groups, and they did not avoid the rat-scented tunnel.

    Look at the standard deviations for V4. There is troublesome heterogeneity of variance here.

Conducting the Discriminant Function Analysis

     Now let us do the discriminant function analysis on the transformed data. Click Analyze, Classify, Discriminant. Put V_Clean, V_Mus, V_Pero, and V_Rat into the Independents box. Put Nurs into the Grouping Variable box.


Click Nurs and then Define Range and define the range from 1 to 3.

Continue. Click Statistics and , select Means, ANOVAs, and Box’s M.


     Continue. Click Classify and select Casewise Results, Summary Table, Combined Groups Plot, and Territorial Map.

     Continue. Click Save and select Discriminant scores.

     Continue, OK.

Interpreting the output

     Now look at the output. The means show the same pattern observed with the untransformed data and the standard deviations show that the heterogeneity of variance has been greatly reduced by the square root transformation.


     The univariate ANOVAs show that the groups differ significantly on number of visits to the rat-scented tunnel and the clean tunnel, with the differences in number of visits to the other two tunnels falling short of statistical significance. Box’s M shows no problem with

    the assumption of equal variance/covariance matrices.

     Look under the heading “Eigenvalues” Two discriminant functions are obtained.

    The first accounts for 1.641/(1.641 + .111) = 94% of the total among-groups variability. The second accounts for the remaining 6%.

     SPSS uses a Stepwise Backwards Deletion to assess the significance of the discriminant functions. The first Wilks Lambda testing the null hypothesis that in the population the groups do not differ from one another on mean D for any of the discriminant

    functions. This Wilks Lambda is evaluated with a chi-square approximation, and for our data it is significant. In the second row are the same statistics for evaluating all discriminant functions except the first. We have only 2 functions, so this evaluates DF by 2

    itself. If we had 3 functions, functions 2 and 3 would be simultaneously evaluated at this point and we would have a third row evaluating function 3 alone. Our second DF falls short

    of statistical significance.

     To interpret the first discriminant function, let us first look at the standardized

    discriminant function coefficients. DF is most heavily weighted on V_Rat. Subjects 1

    who visited the rat tunnel often should get a high score on DF. The loadings (in the 1

    structure matrix) show us that subjects who scored high on DF tended to visit all of the 1

    tunnels (but especially the rat-scented tunnel) frequently.

     Although it fell short of statistical significance, I shall, for pedagogical purposes, attempt to interpret the second discriminant function. Both the standardized discriminant function coefficients and the loadings indicate that scoring high on DF results from tending 2

    to visit the Peromyscus-scented tunnel frequently and the clean tunnel infrequently.

     Under “Functions at Group Centroids” we are given the group means on each of

    the discriminant functions. DF separates the rat-reared animals (who score high on this 1

    function) from the animals in the other two groups. DF separates the Mus-reared animals 2

    (who score high on this function) from the Peromyscus-reared animals. If you look back at

    the transformed group means you can see this separation: Compared to the Peromyscus-reared animals, the Mus-reared animals visited the Peromyscus-scented tunnel more frequently and the clean tunnel less frequently.

     Territorial maps provide a nice picture of the relationship between predicted group and two discriminant functions. Look at the map on our example data. Subjects with D 1

    and D scores that place them in the area marked off by 3’s are classified into Group 3 (rat-2

    reared). The ; marks the group centroid. Group 3 is on the right side of the map, having high scores on DF, (high activity and no avoidance of the rat-scented tunnel). Subjects 1

    with low D and high D scores fall in the upper left side of the map, and are classified into 12


    Group 1 (Mus-reared), while those with low scores on both discriminant functions are classified into Group 2 (Peromyscus-reared).

     When the primary goal is classification, all discriminant functions (including any that are not significant) are generally used. Look at the Casewise Statistics from our example

    analysis. The classifications are based on probabilities using both discriminant functions. For example, for subject 1, .881 = P(Group = 2 | D = -1.953 and D = -1.684), while the 12

    posterior probability of membership in Group 1 is .118. Accordingly, this subject is classified as being in Group 2 (Peromyscus-reared), when, in fact, it was in Group 1 (Mus-


     The combined groups plot, “Canonical Discriminant Functions,” is best viewed in color, since group membership is coded by color. In this plot you can see where each subject falls in the space defined by the two discriminant functions

    Canonical Discriminant Functions




    Group Centroid2



    Rat0Function 2Pero



    -2-1012345 Function 1

     The Classification Results show that knowledge of the animals’ behavior in the

    testing apparatus greatly increased our ability to predict what species of animal reared it. If we were just guessing, we would expect to have a 33% success rate. Using the


    discriminant function, we correctly classify 83% of the rat-reared animals and 62% of the other animals.

Follow-Up Analysis

     Look back at the data set. At the very end you will find two new variables, Dis1_1 and Dis2_1. These are the rats’ scores on the two discriminant functions. I find it useful to make pairwise comparisons on the means of the discriminant functions and on the means of the predictor variables which had significant univariate effects.

     Click Analyze, Compare Means, One-Way ANOVA. Scoot NURS into the Factor box and scoot into the Dependent List V_Clean, V_rat, Dis1_1, and Dis2_1.

     Click Post Hoc and select LSD.


     Continue. OK.

     Look at the output from the ANOVA. For either discriminant function take the Among Groups sums of squares and divide by the Within Groups sum of squares. You get the eigenvalue for that discriminant function. Now take the Among Group sums of squares and 2divide by the total sum of squares and then take the square root of the resulting R. You

    get the canonical correlation for that discriminant function. Finally, for the last (second) discriminant function, take the Error sum of squares and divide by the total sum of squares. You will obtain the Wilks Lambda for that discriminant function.

     The multiple comparisons show that on each the rat-reared group differs significantly from the other two groups on number of visits to the clean tunnel, on number of visits to the rat-scented tunnel and on the first discriminant function.

    Presenting the Results of a Discriminant Function Analysis

     The manner in which the results are presented depends in part on what the goals of the analysis were -- was the focus of the research developing a model with which to classify subjects into groups, or was the focus on determining how the groups differ on a set of continuous variables. In the behavioral sciences the focus is more often the latter.

     You should pay attention to the example presentations in Tabachnick and Fidell. Here I present the results of the analysis done during this lesson.



     In order to determine how the nursing groups differed with respect to their response to the four scented tunnels, we conducted a discriminant function analysis. The data were subjected to a square root transformation prior to analysis, to reduce positive skewness and stabilize the variances. 2 The first discriminant function was statistically significant, = .341, (8, N = 36) = 233.92, p < .001, but the second was not, = .900, (3, N = 36) = 3.32, p = .34. As shown

    in Table 1, high scores on the discriminant function were associated with having made many visits to the tunnels, especially to the Rattus-scented tunnels.

Table 1

Structure of the Discriminant Function

Variable Loading

    Visits to Rattus-scented tunnel .92

    Visits to clean tunnel .40

    Visits to Peromyscus-scented tunnel .30

    Visits to Mus-scented tunnel .30

     Univariate analysis showed that the nursing groups differed significantly on visits to the Rattus-scented tunnel, F(2, 33) = 22.98, MSE = 0.72, p < .001, and the clean tunnel,

    F(2, 33) = 4.54, MSE = 1.10, p = .018, but not on visits to the other two tunnels, .08 < p

    < .10.

     Table 2 contains the classification means for the groups on the discriminant function as well as the group means on each of the four original variables. Fisher’s procedure was

    employed to make pairwise comparisons. It should be noted that when employed to make pairwise comparisons among three and only three groups, Fisher’s procedure has been found to hold familywise error at or below the nominal rate and to have more power than commonly employed alternative procedures (Levin, Serlin, & Seaman, 1994). The Rattus-

    nursed mice scored significantly higher on the discriminant function than did mice in the other two groups and made significantly more visits to the Rattus-scented and the clean

    tunnels than did mice in the other two groups. All other pairwise comparisons fell short of statistical significance.

Table 2

    Group Means on the Discriminant Functions and the Original Four Variables

     Nursing Group

    Variable Rattus Peromyscus Mus A B B Discriminant Function 1 1.701.160.54A B B Visits to Rattus-scented tunnel 12.751.503.33A B Visits to clean tunnel 10.254.584.67


    A A A Visits to Peromyscus-scented tunnel 7.924.335.58A A A Visits to Mus-scented tunnel 10.506.006.25

    Note. Within each row, means having the same letter in their superscripts are not significantly different from each other at the .05 level.

Return to Wuensch’s Statistics Lessons Page

    Copyright 2008 Karl L. Wuensch - All rights reserved.

Report this document

For any questions or suggestions please email