DOC

# Stepwise Discriminant Function Analysis

By Jonathan Payne,2014-11-26 12:32
20 views 0
Stepwise Discriminant Function Analysis

Stepwise Discriminant Function Analysis

SPSS will do stepwise DFA. You simply specify which method you wish to employ for selecting predictors. The most economical method is the Wilks lambda method,” which

selects predictors that minimize Wilks lambda. As with stepwise multiple regression, you

may set the criteria for entry and removal (F criteria or p criteria), or you may take the

defaults.

Imagine that you are working as a statistician for the Internal Revenue Service. You are told that another IRS employee has developed four composite scores (X - X), easily 14

computable from the information that taxpayers provide on their income tax returns and from other databases to which the IRS has access. These composite scores were developed in the hope that they would be useful for discriminating tax cheaters from other persons. To see if these composite scores actually have any predictive validity, the IRS selects a random sample of taxpayers and audits their returns. Based on this audit, each taxpayer is placed into one of three groups: Group 1 is persons who overpaid their taxes by a considerable

amount, Group 2 is persons who paid the correct amount, and Group 3 is persons who

underpaid their taxes by a considerable amount. X through X are then computed for each 14

of these taxpayers. You are given a data file with group membership, X, X, X, and X for 1234

each taxpayer, with an equal number of subjects in each group. Your job is to use discriminant function analysis to develop a pair of discriminant functions (weighted sums of X through X) to predict group membership. You use a fully stepwise selection procedure to 14

develop a (maybe) reduced (less than four predictors) model. You employ the WILKS method of selecting variables to be entered or deleted, using the default p criterion for

entering and removing variables.

file is DFA-

STEP.sav,

which is

available on

Karl’s SPSS-

Data page --

then bring it into

SPSS. To do

the DFA, click

Analyze,

Classify, and

then put Group

into the

Grouping

Variable box,

defining its range from 1 to 3. Put X1 through X4 in the “Independents” box, and select the stepwise method.

DFA-Step.doc

Page 2

Click Continue. Click Method and select “Wilks’ lambda and “Use probability of F.”

Under Statistics, ask for the group means. Under Classify, ask for a territorial map.

Continue, OK.

Look at the output, “Variables Not in the Analysis.” At Step 0 the tax groups (overpaid, paid correct, underpaid) differ most on X (; drops to .636 if X is entered) and “Sig. of F to 33

enter” is less than .05, so that predictor is entered first. After entering X, all remaining 3

predictors are eligible for entry, but X most reduces lambda, so it enters. The Wilks lambda 1

is reduced from .635 to .171. On the next step, only X is eligible to enter, and it does, 2

lowering Wilks lambda to .058. At this point no variable already in meets the criterion for

removal and no variable out meets the criterion for entry, so the analysis stops.

Look back at the Step 0 statistics. Only X and X were eligible for entry. Note, 23

however, that after X was entered, the p to enter dropped for all remaining predictors. Why? 3

X must suppress irrelevant variance in the other predictors (and vice versa). After X is 31

added to X, p to enter for X rises, indicating redundancy of X with X. 3441

Interpretation of the Output from the Example Program

If you look at the standardized coefficients and loadings you will see that high scores on DF result from high X and low X. If you look back at the group means you will 131

see that those who underpaid are characterized by having low X and high X, and thus low 31

DF. This suggests that DF is good for discriminating the cheaters (those who underpaid) 11

from the others. The centroids confirm this.

If you look at the standardized coefficients and loadings for DF you will see that high 2

DF scores come from having high X and low X. From the group means you see that those 221

Page 3

who overpaid will have low DF (since they have a low X and a high X). DF seems to be 2212

good for separating those who overpaid from the others, as confirmed by the centroids for

DF. 2

In the territorial map the underpayers are on the left, having a low DF (high X and 11

low X). The overpayers are on the lower right, having a high DF and a low DF (low X, 3122

high X, high X). Those who paid the correct amount are in the upper right, having a high 31

DF and a high DF (low X, high X, high X). 12 123