DOC

Base_Four_07

By Leo Simmons,2014-08-31 00:22
7 views 0
Base_Four_07

Topic: Overview of Multiple Data Sets

    1. Concatenating data set using Set statement

    2. Interleaving several datasets

    3. Modifying (creating) data set

    4. Concatenating data set using PROC APPEND

    5. Merging datasets

    6. Updating dataset

    7. PROC EXPORT/IMPORT

    8. PROC CONTENTS

     1

    Overview

    One of SAS’s greatest strengths is its ability to combine and process more than one

    data set at a time. The main tools used to do this are the SET, MERGE and UPDATE

    statements.

    1. Concatenating data set using Set statement It reads an observation from one or more SAS data sets. The SET statement is flexible

    and has a variety of uses in SAS programming. These uses are determined by the options

    and statements that you use with the SET statement: data one;

     input year pop $ @@;

    datalines;

    1991 500K 1992 501K 1993 502K

    ;

    data two;

     input year pop $ @@;

    datalines;

    1991 400K 1992 401K 1993 402K

    ;

    data three;

     input year pop $ @@;

    datalines;

    1991 300K 1992 301K 1993 302K 1994 303K

    ;

    data combine_1;

     set one two three;

    run;

data con1;

     input custom_id $ product $ 12.;

    cards;

    28901 pentium IV

    36815 pentium III

    21224 pentium IV

    ;

    data con2;

     input custom_id $ product $ 12.;

    cards;

    18601 pentium IV

    24683 pentium III

    851921 pentium IV

    61831 pentium IV

    ;

    data con3;

     set con1;

     set con2;

    run;

    2. Interleaving several datasets

    If you want to combine several datasets so that observations sharing a common value are

    all adjacent to each other, you can list the datasets on SET statement, and specify the

    variable to be used on a BY statement. Note: The data sets to be interleaved must already

    be sorted by the variable(s) listed in the BY statement.

     2

/*Creating a new data set from multiple data sets based upon sorted

    order*/;

    data animal;

     input common $ animal $; datalines;

    a Ant

    a Ape

    b Bird

    c Cat

    d Dog

    ;

    data plant;

     input common $ plant $; datalines;

    a Apple

    b Banana

    c Coconut

    d Dewberry

    e Eggplant

    f Fig

    ;

    data interleaving;

     set animal plant;

     by common;

    run;

    proc print data=interleaving; run;

    3. The SET statement is used to modify an existing SAS data set

    and Reading Observations Using Direct Access

    data old;

     input x1-x3;

    datalines;

    1 2 3

    3 4 5

    ;

    data new;

     set old;

     xtot=sum(of x1-x3); run;

data a;

    input x y @@;

    cards;

    901 1 902 2 903 3 904 4 905 5

    ;

    data b;

    x=2; /*select a given obs*/ set a point=x;

    output;

    stop;

    run;

     3

    To create a temporary numeric variable whose value is used to detect the last observation, you can use the END= option in the SET statement.

    data en;

     input accnt balance day @@;

    cards;

    901 486 1 901 985 4 903 498 2 903 498 2

    ;

    data en2;

     set en end=last;

     if last;

    run;

    4. Concatenating data set using PROC APPEND

    Concatenating SAS data sets is the process of storing observations one after another until all the data sets and their observations have been combined into one data set. Many users perform the concatenation process using a DATA step (as shown in the previous examples), but there are good reasons for using the APPEND procedure. If you use the SET statement in a DATA step to concatenate two data sets, the SAS System must process all the observations in both data sets to create a new one. The APPEND procedure bypasses the processing of data in the original data set and adds new observations directly to the end of the original data set. It does this by positioning the record pointer at the end of the original data set, and starting the processing directly with the new observations.

PROC APPEND

    Syntax: BASE= SAS-data-set <DATA=SAS-data-set> <FORCE>

    Since PROC APPEND reads only the second data set, set BASE= to the larger data set. However, the order of the data sets does not matter.

data master;

     input city $ 1-11 month $10. temp;

     cards;

    Honolulu August 80.7

    Honolulu January 72.3

    Boston July 73.3

    Boston January 29.2

    Duluth July 65.6

    Duluth January 8.5

    New York August 82.7

    New York January 22.3

    ;

    data add;

     input city $ 1-11 month $10. temp;

     cards;

    Raleigh July 77.5

    Raleigh January 40.5

    Miami August 82.9

    Miami January 67.2

    Los Angeles August 69.5

    Los Angeles January 54.5

    ;

     4

run;

    proc append base=master data=add;

    run;

data weather;

     input date mmddyy8. temp sunhrs;

     cards;

    01-01-90 35 9.3

    01-02-90 34 9.33

    01-03-90 37 9.35

    01-04-90 38 9.39

    ;

    data daily;

     input date mmddyy8. temp sunhrs precip;

     cards;

    01-05-90 35 9.40 1.1

    ;

    run;

    proc append base= weather data=daily force;

    run;

    What are the advantages or/and disadvantages of SET statement and PROC APPEND?

    1. PROC APPEND is more efficient for appending two data sets. Because PROC

    APPEND performs an update in place on the BASE= data set; therefore, it just

    adds the observations from the DATA= data set to the end of the BASE= data set.

    The observations in the BASE= data set are not read or processed.

    2. The DATA with SET does not perform an update in place, thus the original data

    sets would not be damaged should the process terminate abnormally.

    3. If the SAS job terminates abnormally while the APPEND procedure is processing,

    the BASE= data set will be marked as damaged.

    4. PROC APPEND cannot add variables to the BASE= data set, it can only add

    observations to the existing structure of the BASE= data set.

    5. If you need to add new variables and observations to the BASE= data set, a

    DATA step with a SET statement is the solution.

    5. Merging datasets

    Overview

    The Merge statement is flexible and has a variety of uses in SAS programming. It joins observations from two or more SAS data sets into single observations. This section describes basic uses of MERGE. Other applications include using more than one BY variable, merging more than two data sets, and merging a few observations with all observations in another data set.

One-to-One Merging

    One-to-one merging combines observations from two or more SAS data sets into a single observation in a new data set. To perform a one-to-one merge, we use the MERGE statement without a BY statement. SAS combines the first observation from all data sets that are named in the MERGE statement into the first observation in the new data set, the second observation from all data sets into the second observation in the new data set, and

     5

so on. In a one-to-one merge, the number of observations in the new data set is equal to

    the number of observations in the largest data set named in the MERGE statement. Example:

    /*One-to-One MERGE by combining two data sets*/;

    data merg_one;

     input name $ age;

    datalines;

    Chris 36

    Jane 21

    Jerry 30

    Joe 49

    ;

    data merg_two;

     input name $ salary;

     format salary dollar10.;

    datalines;

    Chris 33000

    Jane 40000

    Jerry 60000

    Joe 26000

    Zoe 60000

    ;

    data both;

     merge merg_one

     merg_two;

    run;

    proc print data=both;

    run;

Match-Merging

    Match-merging combines observations from two or more SAS data sets into a single observation in a new data set according to the values of a common variable. The number of observations in the new data set is the sum of the largest number of observations in each BY group in all data sets. To perform a match-merge, we use a BY statement immediately after the Merge statement. The variables in the BY statement must be

    common to all data sets. Only one BY statement can accompany each MERGE statement in a DATA step. The data sets that are listed in the MERGE statement must be sorted in order of the values of the variables that are listed in the BY statement. Examples:

/* Create sample data */ ;

    data merg1;

     input id name& $20.;

    datalines; 1 Nay Rong 2 Kelly Windsor 3 Julio Meraz 4 Richard Krabill 5 Rita Giuliano ;

    data merg2;

     input id sale;

     format sale dollar10.; datalines;

     6

    1 28000 2 30000 3 35000 4 25000 5 40000 ;

    data merg3;

     input id bonus;

     format bonus dollar10.; datalines; 1 2000 2 4000 3 3000 4 2500 5 2800 ;

    data final;

     merge merg1

     merg2

     merg3;

     by id; run; proc print data=final;

    run;

/* One-to-many or many-to-one merge */;

    /* Goal: Combine two data sets by common variables when there are

     duplicates in only one data set.*/;

    data one;

     input id $ fruit $;

    datalines;

    a apple

    a apple

    b banana

    c coconut

    ;

data two;

     input id $ color $;

    datalines;

    a amber

    b brown

    c cream

    c cocoa

    c carmel

    ;

data both;

     merge one two;

     by id;

    run;

Other examples with Merge In=:

    /*Merge multiple data sets and output matches only*/;

    data file1;

     7

     input var name $;

    datalines;

    100 Anja

    200 Bob

    400 Chandra

    600 Darrin

    ;

    data file2;

     infile cards dsd truncover;

     input var address $ 13.;

    datalines;

    100,34 Smith Road

    200,67 Burt Ave

    300,12 You St

    400,45 Younge St

    500,79 Wellington

    600,23 Done Road

    ;

    data file3;

     input var zip;

    datalines;

    100 28092

    200 27502

    300 27539

    600 27526

    ;

    data three;

     merge file1 (in=a) file2 (in=b) file3 (in=c);

     by var;

     if a and b and c;

    run;

    /*Merge data sets by a common variable and create output data

    sets based upon observation origin*/;

    data fileone;

     input id $ name $ dept $ project $; datalines; 000 Miguel A12 Document 111 Fred B45 Survey 222 Diana B45 Document 888 Monique A12 Document 999 Vien D03 Survey ; data filetwo;

     input id $ name $ projhrs; datalines; 111 Fred 35

    222 Diana 40

    777 Steve 0

    888 Monique 37

    999 Vien 42

    ; data both one_only two_only;

     merge fileone(in=in1) filetwo(in=in2);

     by id;

     if in1 and in2 then output both;

     else if in1 then output one_only;

     else output two_only;

     8

    run; proc print data=both;

    run;

    Note: Use IN= logic to determine if the current BY group is found in both data sets.

    6. Update statement

    It updates a master file by applying transactions.

Requirements

    ; The UPDATE statement must be accompanied by a BY statement that specifies

    the variables by which observations are matched.

    ; The BY statement should immediately follow the UPDATE statement to which it

    applies.

    ; The data sets listed in the UPDATE statement must be sorted by the values of the

    variables listed in the BY statement, or they must have an appropriate index.

    ; Each observation in the master data set should have a unique value of the BY

    variable or BY variables. If there are multiple values for the BY variable, only the

    first observation with that value is updated. The transaction data set can contain

    more than one observation with the same BY value. (Multiple transaction

    observations are all applied to the master observation before it is written to the

    output file.)

    Example:

    data master;

     input part_id price;

    cards;

    109 218

    110 156

    111 98

    112 36

    113 45

    ;

    data trans;

     input part_id price;

    cards;

    109 208

    110 149

    113 .

    121 78

    122 46

    ;

    data new;

     update master trans;

     by part_id;

    run;

    7. PROC EXPORT/IMPORT

    PROC EXPORT

    The EXPORT procedure reads data from a SAS data set and writes it to an external data source. External data sources can include DBMS tables, PC files, spreadsheets, and

     9

    delimited external files (which are files that contain columns of data values that are separated by a delimiter such as a blank or a comma).

PROC EXPORT DATA=SAS-data-set

    OUTFILE=filename| OUTTABLE=table-name

    <DBMS=identifier> <REPLACE>;

Required Arguments

    DATA=SAS-data-set

    identifies the input SAS data set with either a one- or two-level SAS name (library and member name). If you specify a one-level name, PROC EXPORT assumes the WORK library.

    OUTFILE="filename"

    specifies the complete path and filename of the output PC file, spreadsheet, or delimited external file. If the name does not include special characters (like the backslash in a path), lowercase characters, or spaces, you can omit the quotes.

OUTTABLE="tablename"

    specifies the table name of the output DBMS table. If the name does not include special characters (like question marks), lowercase characters, or spaces, you can omit the quotes. Note that the DBMS table name may be case-sensitive.

Options

DBMS=identifier

    specifies the type of data to export. For example, DBMS=DBF specifies to export a dBASE file. For PC files, spreadsheets, and delimited external files, you do not have to specify DBMS= if the filename specified with OUTFILE= contains a valid extension so that PROC EXPORT can recognize the type of data. DBMS = specifies the type of file to export, e.g. ACCESS, CSV, DLM, EXCEL, TAB.

REPLACE

    overwrites an existing file. If you do not specify REPLACE, PROC EXPORT does not overwrite an existing file.

/*Exporting a Excel File*/;

    data expt;

    input Product $ 1-10 Quantity Price ;

    datalines;

    Tea 10 16.00

    Beer 24 19.00

    Syrup 12 10.00

    Seasoning 48 22.00

    Mix 36 21.35

    ;

    proc export data=expt

     outfile="c:\Prices.xls"

     DBMS=Excel2000;

     *replace;

     10

Report this document

For any questions or suggestions please email
cust-service@docsford.com