DOC

A LITTLE SAS GUIDE WITH EXAMPLES

By Lois Ortiz,2014-07-06 12:38
8 views 0
A LITTLE SAS GUIDE WITH EXAMPLES

SAS Guide With

Examples

Copyright ? mh2007

    Table of Contents

Section 1 Preliminaries

    1.1 Getting Started with SAS

    1.2 More on Data Management by Example

Section 2 Parametric Tests

    2.1 Getting Acquainted with the T-test

    2.2 ANOVA by Example

    2.3 Linear Regression by Example

Section 3 Nonparametric Tests

    3.1 Wilcoxon Rank Sum Test and Kruskal-Wallis

    Test by Example

Section 4 Logistic Regression

    4.1 Logistic Regression by Example

Section 1 Preliminaries

1.1 Getting Started With SAS

    One of the first things to consider prior to creating a data set is to know what information you would like conveyed. The biggest problem that researchers have is coming up with a clear and concise question and sticking to it. Under the assumption that the data of interest has already been obtained through a primary source (your own data collection technique) or a secondary source (another investigators data or maybe data on the internet), you now must be able to bring this data into a usable format. This is referred to as creating a data set. The following steps are helpful:

Using SAS

First, name the data set you would like to create.

data one;

    The following data set is given the name, one. The data set, one, is empty for the moment until actual raw data can be brought in.

    Before going any further, one important thing to remember when using SAS is to put a semicolon after each statement. This can get tedious at times, but is something that should be kept in mind before starting anything in SAS.

    In order to bring in your data a variety of possibilities are available. One method is to keypunch your data directly into SAS. Before you start you must use the input command as in the following line:

input Sub_name $ weight age;

    The input command lets SAS know not only the variable name for each of the columns of your data but also the type of data you will be bringing in. The two types of data that we will be concerned with is character and numeric. From the input line, it is easily noticed that Sub_name is your first column variable, and will refer to the subject‟s name. Since names are character data, we need to let SAS know this. This is accomplished with the use of the ($). Now SAS knows it will be reading in character or alpha numeric data for the variable Sub_name. The other variables: height, weight, and treat, will be considered numeric. One thing to note is that SAS is capable of reading in format types other than character and numeric data. It can read in dates for example, but we will leave that issue for a later discussion. Once SAS knows what type of data you will be using, and before the keypunching process can begin a datalines statement is required.

datalines;

    The datalines statement lets SAS know where the raw data exists and will be read from. After the datalines statement, you can begin the keypunching process using a space as a delimiter, which allows SAS to distinguish between the different variables you are punching in data for. The following is an example of some observation lines:

JaySmith 125 22

    MaryCarr 115 20

    JoeJones 177 46

    ;

    Notice that the data after being keypunched in was followed by a semicolon. This indicates that there will be no more records entered and that SAS has reached the end of the raw data set. Once these steps have been completed procedure statements called proc statements can be used to view, manipulate, or analyze the data. Keep in mind that a “run” statement usually follows all proc statements. You can use the following proc statements to view the contents of what SAS has read into data set one:

proc print;

    run;

    If you are using the windows version of SAS, just point to the running man on the toolbar and click. This is an execute command that runs the print procedure.

    One thing to be aware of is that SAS by default will read only up to eight character values. Some names longer than eight characters will get cut off at the eighth letter or character. This problem is easily handled by specifying the number of characters that should be allotted to accommodate the length of the longest observation being entered. For example, assume that you would like to read in the following data set:

Jay Smith 125 22

    Mary Carr 115 20

    Joe Jones 177 46

    Rachel Lonely 105 27

    ;

    Now we have two problems that must be addressed. The first is that the subjects‟ names are more than eight characters in length. The other is that there are spaces between the first and last names of each of the subjects. This is a problem because SAS uses a space as a delimiter. When SAS encounters the space between the names, it assumes that the next variable is being addressed which is not the case in our example. The following input statement corrects our problem.

input Sub_name $ 1-13 weight age;

    By specifying the number of spaces needed to accommodate the longest name, our problem is solved. Specifying the number of spaces needed to read in data can be used for both character and numeric data as follows:

input patient $ 1-13 weight 15-17 age 19-20;

Now try executing the following statements:

data one;

     input patient $ 1-13 weight 15-17 age 19-20;

    datalines;

    Jay Smith 125 22

    Mary Carr 115 20

    Joe Jones 177 46

    Rachel Lonely 105 27

    ;

    proc print;

    run;

This is the output you should have obtained:

     The SAS System Obs Sub_name weight age 1 Jay Smith 125 22 2 Mary Carr 115 20 3 Joe Jones 177 46 4 Rachel Lonely 105 27

Now let‟s explore some other ways of bringing in data.

    Assume that the same information is saved as a text file, “example.txt”, in ASCII format in the following location “C:\Temp” on your PC. This data set can be read using an ASCII editor like notepad. The following is a copy of the contents stored in the file “example.txt”:

Jay Smith 125 22

    Mary Carr 115 20

    Joe Jones 177 46

    Rachel Lonely 105 27

    Instead of copying and pasting this information from notepad to SAS, the data stored in “example.txt” can be brought in using the following set of statements:

data one;

     infile 'C:\Temp\example.txt';

     input patient $ 1-13 weight 15-17 age 19-20;

    proc print;

    run;

The infile statement gives SAS the location of text file where the data of interest in stored.

    Notice that the same input statement is used as if the data were keypunched directly into

    SAS.

    Now, assume that the same information is stored as an EXCEL file, example.xls, in the following location “C:\Temp” on your PC. Although EXCEL is a good spreadsheet

    program, it is not very accommodating if complex analysis of you data is required. Therefore importing your data from EXCEL into SAS is often necessary, and can be done easily with the following procedure statements:

proc import datafile = 'C:\Temp\example.xls' out = exdata1

     replace;

proc print;

    run;

    The statement proc import allows the SAS user to import data from an EXCEL spreadsheet into SAS. The datafile statement provides the reference location of the file. In this case, the file “example” with the extension “.xls” to denote an EXCEL file is the file we would like to import. The out statement is used to name the SAS data set that has been created by the import procedure. Notice that print procedure has been utilized once more in order to view the contents of the SAS data set exdata1.

    Although we have mentioned character and numeric data types, we have yet to discuss formatting numeric data, and how to work with variables that have dates as observations.

    Looking at the following data set may help us understand the necessary steps involved when working with dates as observations and formatting numeric data. The columns in this data refers to the name of the subject, the date they were observed, their date of birth, and their weight in pounds, which is to consist of two digits behind a decimal point.

Mary 11/12/2002 06/05/78 12567

    Joe 05/14/2001 07/08/67 15634

    James 01/09/2002 02/28/64 16790

    In this case the researcher would like to create a SAS data set using this information. Later, the researcher would like to utilize the information consisting of the date of observation and the date of birth to acquire the age of each subject. The researcher also notices that the decimal point in the data for the weight of the subjects is missing. Although these problems may seem complex to the novice SAS user, they can easily be remedied with the use of “informat” statements. Informat statements allow the user to specify the format of the variables at the input stage of the SAS code. This basically lets SAS know what type of data it is reading in before it even reads it in.

    The following is an input statement that contains informats and other symbols that we have yet to discuss:

input sub $ 1-5 @8 obs_date MMDDYY10. @19 dob MMDDYY8. @28 weight d5.2;

The informat “MMDDYY10.” is commonly used when reading in dates. Notice that the

    number of characters for the date of observation is ten, eight numeric values and the two

    forward slashes. For the date of birth, there are eight characters, six numeric and the two forward slashes. The variable weight uses the informat “d5.2” which informs SAS to

    read in five values with two behind a decimal point. Something else that should look different is the “@” followed by a number in front of the variable names. The “@” indicates to SAS where to begin reading the variable observations that are being inputted. For example “@8”, refers to character column 8, so SAS will begin reading “obs_date” data at column eight.

    At this point, we are ready to create a data set; let‟s call it testing, and it should look something like the following:

data testing;

     input sub $ 1-5 @8 obs_date MMDDYY10. @19 dob MMDDYY8. @28 weight

    d5.2;

    datalines;

    Mary 11/12/2002 06/05/78 12567

    Joe 05/14/2001 07/08/67 15634

    James 01/09/2002 02/28/64 16790

    ;

    proc print;

    run;

After running this code, you should obtain the following output:

     The SAS System Obs sub obs_date dob weight 1 Mary 15656 6730 125.67 2 Joe 15109 2745 156.34 3 James 15349 1519 167.90

    One thing that should be readily noticeable is that numeric values are given in place of the dates. Those numeric values represent the number of days after January 1, 1960, which is the year cutoff for SAS. Also note that the weight of the subjects is in the desired format.

    The calculation of the age for each subject is now relatively easy. By adding one line to the previous SAS code, we can subtract the observation date from the date of birth and divide by 365.25. I use 365.25 because it takes into account the fact that every four years is a leap year. The results we desire should be provided with the following SAS code:

data testing;

     input sub $ 1-5 @8 obs_date MMDDYY10. @19 dob MMDDYY8. @28 weight

    d5.2;

    age = (obs_date - dob)/365.25;

    datalines;

Mary 11/12/2002 06/05/78 12567

    Joe 05/14/2001 07/08/67 15634

    James 01/09/2002 02/28/64 16790

    ;

    proc print;

    run;

    Notice that the line containing age is the only difference between this piece of SAS code and the one previously written. The following is the output that should have been obtained:

     The SAS System Obs sub obs_date dob weight age 1 Mary 15656 6730 125.67 24.4381 2 Joe 15109 2745 156.34 33.8508 3 James 15349 1519 167.90 37.8645

1.2 More on Data Management by Example

    The following tutorial allows you to go through some steps that will help you get better acquainted with data management using SAS. First locate the file “anthros.txt” on the

    diskette provided or on your hard drive. The file should contain the following lines of text:

The following is a list of five individuals along with their

    corresponding height in inches and weight in pounds.

Bruce Burks 66 128

    Janie Johnson 62 135

    Joe Jacobs 72 157

    Jim Lewis 70 186

    Linda Carol 60 119

    There are quite a few things to consider before reading in this raw data. The first thing that you should notice is that the first two lines give you information about your data set. The third line is a blank line, and the data of interest starts on the fourth line. Unfortunately, SAS cannot distinguish between text that is giving information about the data and the data itself; however, this dilemma can easily be resolved in the following manner:

data anthro;

     infile 'C:\Temp\anthros.txt' firstobs = 4;

     input Name $ 1-13 Height 15-16 Weight 18-20;

    Here the data set that was created was named bmi. Next we used an INFILE statement to bring in the raw data from either a diskette or our hard drive. Notice the option in the INFILE statement “firstobs”. The “firstobs” option in the INFILE statement basically alerts SAS to the fact that the data that should be read begins on the “nth” line. In our example, the nth line is the fourth line, where “n” is restricted to the positive integers. The “firstobs” option is extremely useful especially when you acquire data sets that have extraneous information and want to jump directly into the analysis phase.

    For the sake of completeness, let‟s take another example. The following text file contains the exact same information as previously given the only difference is that it‟s arranged differently.

Bruce Burks 66 128

    Janie Johnson 62 135

    Joe Jacobs 72 157

    Jim Lewis 70 186

    Linda Carol 60 119

This is a list of five individuals along with their corresponding

    height in inches and weight in pounds.

Report this document

For any questions or suggestions please email
cust-service@docsford.com