Stepwise Discriminant Function Analysis

By Jonathan Payne,2014-11-26 12:32
5 views 0
Stepwise Discriminant Function Analysis

    Stepwise Discriminant Function Analysis

     SPSS will do stepwise DFA. You simply specify which method you wish to employ for selecting predictors. The most economical method is the Wilks lambda method,” which

    selects predictors that minimize Wilks lambda. As with stepwise multiple regression, you

    may set the criteria for entry and removal (F criteria or p criteria), or you may take the


    Imagine that you are working as a statistician for the Internal Revenue Service. You are told that another IRS employee has developed four composite scores (X - X), easily 14

    computable from the information that taxpayers provide on their income tax returns and from other databases to which the IRS has access. These composite scores were developed in the hope that they would be useful for discriminating tax cheaters from other persons. To see if these composite scores actually have any predictive validity, the IRS selects a random sample of taxpayers and audits their returns. Based on this audit, each taxpayer is placed into one of three groups: Group 1 is persons who overpaid their taxes by a considerable

    amount, Group 2 is persons who paid the correct amount, and Group 3 is persons who

    underpaid their taxes by a considerable amount. X through X are then computed for each 14

    of these taxpayers. You are given a data file with group membership, X, X, X, and X for 1234

    each taxpayer, with an equal number of subjects in each group. Your job is to use discriminant function analysis to develop a pair of discriminant functions (weighted sums of X through X) to predict group membership. You use a fully stepwise selection procedure to 14

    develop a (maybe) reduced (less than four predictors) model. You employ the WILKS method of selecting variables to be entered or deleted, using the default p criterion for

    entering and removing variables.

    Your data

    file is DFA-


    which is

    available on

    Karl’s SPSS-

    Data page --

    download it and

    then bring it into

    SPSS. To do

    the DFA, click


    Classify, and

    then put Group

    into the


    Variable box,

    defining its range from 1 to 3. Put X1 through X4 in the “Independents” box, and select the stepwise method.

     Copyright 2008 Karl L. Wuensch - All rights reserved.


    Page 2

    Click Continue. Click Method and select “Wilks’ lambda and “Use probability of F.”

    Under Statistics, ask for the group means. Under Classify, ask for a territorial map.

    Continue, OK.

     Look at the output, “Variables Not in the Analysis.” At Step 0 the tax groups (overpaid, paid correct, underpaid) differ most on X (; drops to .636 if X is entered) and “Sig. of F to 33

    enter” is less than .05, so that predictor is entered first. After entering X, all remaining 3

    predictors are eligible for entry, but X most reduces lambda, so it enters. The Wilks lambda 1

    is reduced from .635 to .171. On the next step, only X is eligible to enter, and it does, 2

    lowering Wilks lambda to .058. At this point no variable already in meets the criterion for

    removal and no variable out meets the criterion for entry, so the analysis stops.

     Look back at the Step 0 statistics. Only X and X were eligible for entry. Note, 23

    however, that after X was entered, the p to enter dropped for all remaining predictors. Why? 3

    X must suppress irrelevant variance in the other predictors (and vice versa). After X is 31

    added to X, p to enter for X rises, indicating redundancy of X with X. 3441

    Interpretation of the Output from the Example Program

    If you look at the standardized coefficients and loadings you will see that high scores on DF result from high X and low X. If you look back at the group means you will 131

    see that those who underpaid are characterized by having low X and high X, and thus low 31

    DF. This suggests that DF is good for discriminating the cheaters (those who underpaid) 11

    from the others. The centroids confirm this.

    If you look at the standardized coefficients and loadings for DF you will see that high 2

    DF scores come from having high X and low X. From the group means you see that those 221

    Page 3

    who overpaid will have low DF (since they have a low X and a high X). DF seems to be 2212

    good for separating those who overpaid from the others, as confirmed by the centroids for

    DF. 2

    In the territorial map the underpayers are on the left, having a low DF (high X and 11

    low X). The overpayers are on the lower right, having a high DF and a low DF (low X, 3122

    high X, high X). Those who paid the correct amount are in the upper right, having a high 31

    DF and a high DF (low X, high X, high X). 12 123

    Copyright 2008 Karl L. Wuensch - All rights reserved.

Report this document

For any questions or suggestions please email