Data Screening - Data Analysis with SPSS

By Tommy Duncan,2014-01-10 22:08
15 views 0
Data Screening - Data Analysis with SPSS

Statistics Spring 2008

    Lab #1 Data Screening

The purpose of data screening is to:

    (a) check if data have been entered correctly, such as out-of-range values. (b) check for missing values, and deciding how to deal with the missing values. (c) check for outliers, and deciding how to deal with outliers.

    (d) check for normality, and deciding how to deal with non-normality.

1. Finding incorrectly entered data

    ; Your first step with “Data Screening” is using “Frequencies”

    1. Select Analyze --> Descriptive Statistics --> Frequencies

    2. Move all variables into the “Variable(s)” window.

    3. Click OK.

    ; Output below is for only the four “system” variables in our dataset because copy/pasting the output for all

    variables in our dataset would take up too much space in this document.

    ; The “Statistics” box tells you the number of missing values for each variable. We will use this information

    later when we are discussing missing values.

    ; Each variable is then presented as a frequency table. For example, below we see the output for “system1”. By

    looking at the coding manual for the “Legal beliefs” survey, you can see that the available responses for

    “system1” are 1 through 11. By looking at the output below, you can see that there is a number out-of-range:

    “13”. (NOTE in your dataset there will not be a “13” because I gave you the screened dataset, so I have

    included the “13” into this example to show you what it looks like when a number is out of range.) Since 13 is

    an invalid number, you then need to identify why “13” was entered. For example, did the person entering data

    make a mistake? Or, did the subject respond with a “13” even though the question indicated that only numbers

    1 through 11 are valid? You can identify the source of the error by looking at the hard copies of the data. For

    example, first identify which subject indicated the “13” by clicking on the variable name to highlight it

    (system1), and then using the “find” function by: Edit --> Find, and then scrolling to the left to identify the

    subject number. Then, hunt down the hard copy of the data for that subject number.


    2. Missing Values

    ; Below, I describe in-depth how to identify and deal with missing values.

    ; Why do missing values occur? Missing values are either random or non-random. Random missing values may

    occur because the subject inadvertently did not answer some questions. For example, the study may be overly

    complex and/or long, or the subject may be tired and/or not paying attention, and miss the question. Random

    missing values may also occur through data entry mistakes. Non-random missing values may occur because

    the subject purposefully did not answer some questions. For example, the question may be confusing, so many

    subjects do not answer the question. Also, the question may not provide appropriate answer choices, such as

    “no opinion” or “not applicable”, so the subject chooses not to answer the question. Also, subjects may be

    reluctant to answer some questions because of social desirability concerns about the content of the question,

    such as questions about sensitive topics like past crimes, sexual history, prejudice or bias toward certain

    groups, and etc.

    ; Why is missing data a problem? Missing values means reduced sample size and loss of data. You conduct

    research to measure empirical reality so missing values thwart the purpose of research. Missing values may

    also indicate bias in the data. If the missing values are non-random, then the study is not accurately measuring

    the intended constructs. The results of your study may have been different if the missing data was not missing. ; How do I identify missing values?