；Stepwise Discriminant Function Analysis
SPSS will do stepwise DFA. You simply specify which method you wish to employ for selecting predictors. The most economical method is the Wilks lambda method,” which
selects predictors that minimize Wilks lambda. As with stepwise multiple regression, you
may set the criteria for entry and removal (F criteria or p criteria), or you may take the
Imagine that you are working as a statistician for the Internal Revenue Service. You are told that another IRS employee has developed four composite scores (X - X), easily 14
computable from the information that taxpayers provide on their income tax returns and from other databases to which the IRS has access. These composite scores were developed in the hope that they would be useful for discriminating tax cheaters from other persons. To see if these composite scores actually have any predictive validity, the IRS selects a random sample of taxpayers and audits their returns. Based on this audit, each taxpayer is placed into one of three groups: Group 1 is persons who overpaid their taxes by a considerable
amount, Group 2 is persons who paid the correct amount, and Group 3 is persons who
underpaid their taxes by a considerable amount. X through X are then computed for each 14
of these taxpayers. You are given a data file with group membership, X, X, X, and X for 1234
each taxpayer, with an equal number of subjects in each group. Your job is to use discriminant function analysis to develop a pair of discriminant functions (weighted sums of X through X) to predict group membership. You use a fully stepwise selection procedure to 14
develop a (maybe) reduced (less than four predictors) model. You employ the WILKS method of selecting variables to be entered or deleted, using the default p criterion for
entering and removing variables.
file is DFA-
Data page --
download it and
then bring it into
SPSS. To do
the DFA, click
then put Group
defining its range from 1 to 3. Put X1 through X4 in the “Independents” box, and select the stepwise method.
； Copyright 2008 Karl L. Wuensch - All rights reserved.
Click Continue. Click Method and select “Wilks’ lambda” and “Use probability of F.”
Under Statistics, ask for the group means. Under Classify, ask for a territorial map.
Look at the output, “Variables Not in the Analysis.” At Step 0 the tax groups (overpaid, paid correct, underpaid) differ most on X (; drops to .636 if X is entered) and “Sig. of F to 33
enter” is less than .05, so that predictor is entered first. After entering X, all remaining 3
predictors are eligible for entry, but X most reduces lambda, so it enters. The Wilks lambda 1
is reduced from .635 to .171. On the next step, only X is eligible to enter, and it does, 2
lowering Wilks lambda to .058. At this point no variable already in meets the criterion for
removal and no variable out meets the criterion for entry, so the analysis stops.
Look back at the Step 0 statistics. Only X and X were eligible for entry. Note, 23
however, that after X was entered, the p to enter dropped for all remaining predictors. Why? 3
X must suppress irrelevant variance in the other predictors (and vice versa). After X is 31
added to X, p to enter for X rises, indicating redundancy of X with X. 3441
Interpretation of the Output from the Example Program
If you look at the standardized coefficients and loadings you will see that high scores on DF result from high X and low X. If you look back at the group means you will 131
see that those who underpaid are characterized by having low X and high X, and thus low 31
DF. This suggests that DF is good for discriminating the cheaters (those who underpaid) 11
from the others. The centroids confirm this.
If you look at the standardized coefficients and loadings for DF you will see that high 2
DF scores come from having high X and low X. From the group means you see that those 221
who overpaid will have low DF (since they have a low X and a high X). DF seems to be 2212
good for separating those who overpaid from the others, as confirmed by the centroids for
In the territorial map the underpayers are on the left, having a low DF (high X and 11
low X). The overpayers are on the lower right, having a high DF and a low DF (low X, 3122
high X, high X). Those who paid the correct amount are in the upper right, having a high 31
DF and a high DF (low X, high X, high X). 12 123
Copyright 2008 Karl L. Wuensch - All rights reserved.