DOCX

SAS programming

By Marilyn Cox,2014-02-21 14:04
7 views 0
SAS programming

SAS Programming - - BASE

1 Introducing to SAS Programming

    1.1 Basic Concepts

SAS两类元素 :

    ; DATA: Typically create or modify SAS data sets

    ; PROC: Analyze and process the data in a SAS data set and to present the data in the form of a report.

    Run is a statement too.

    You can specify SAS statements in uppercase or lowercase. In most situations, text that is enclosed in quotation marks is case sensitive.

Every SAS file is stored in a SAS library:

    Temporary SAS files :Work

    Permanent SAS files:自定义的lib

SAS data set is a file that consists of two parts: a descriptor portion and a data

    portion. Some SAS data sets also contain one or more indexes. For character variables, a blank represents a missing value. For numeric variables, a period represents a missing value.

format: Formats are variable attributes that affect the way data values are written. SAS software offers a variety

    of character, numeric, and date and time formats.

    format是从sas数据集中读出数据展示的格式

    informat: Read data values in certain forms into standard SAS values.

    数据写入sas的格式

    1.2 Using the Programming Workspace

    When you delete a SAS library, the pointer to the library is deleted, and SAS no longer has access to the library. However, the contents

    of the library still exist in your operating environment

     Which of the following is true about

    SAS output?

    a. You can create both listing output

    and HTML output.

    b. You can manage all types of SAS

    output in the Results window.

    c. Listing output and HTML output

    appear as separate items in the

    Results window.

    d. all of the above

Correct answer: d

    You can create listing output, HTML output, or both types of output.

    The Results window displays separate icons for the two basic types

    of output, and it helps you navigate and manage both types of output

1.3 Referencing Files and Setting Options

    RememberLIBNAME and OPTIONS statements remain in effect for the current SAS session only. basic LIBNAME statement: LIBNAME libref 'SAS-data-library'; For example:

    LIBNAME libref 'C:\ytl';

    run;

    proc print data=libref.heart;

    run;

访问其他类型文件的形式;

语法格式为;

    LIBNAME libref engine 'SAS-data-library';

    在下列特定情况下,'SAS-data-library'必须为文件名而非文件夹名

    Engine Description

    BMDP allows read-only access to BMDP files

    OSIRIS allows read-only access to OSIRIS files

    SPSS allows read-only access to SPSS files

For example:

    libname testdata spss

    'c:\myspss\data\labfiles.dat';

    显示lib 的信息;

    proc contents data=mylib._all_ nods; run;

    _all_以列表形式显示全部,某个具体文件时显示该文件的详细信息

content内容两种形式的显示;

    proc datasets;

     contents data=sasuser.admit varnum;

    quit;

    proc contents data=sasuser.admit varnum;

    run;

Change the output setting: Tools Options Preferences. Then click

    the Results tab.

修改系统设置;Option command

    options nonumber nodate; 不显示页面号、日期,反之numberdate就是要显示 options pagesize=nn; nn为一页要显示的行数

    options linesize=nn; nn为一行要显示宽度

    options pageno=nn; nn为数据的起始页,即第一页叫多少页,默认为1 options yearcutoff=nnnn; ,仅仅!控制两位数字的年份

     举例;options yearcutoff=1950

    

    04/15/2030 04/15/30

    15Apr95 15Apr1995 使用OBSFIRSTOBSObservations进行选择;

     options OBS=nn 前多少个数据

     options FIRSTOBS=nn 从第nn个数据开始的数据

     可以使用options obs=max;恢复默认值

    下列两个firstobsobs的作用域是不一样的;

    options firstobs=10 obs=15; proc print data=sasuser.heart(firstobs=10 obs=15); run;

Other options:

    Option Description

    FORMCHAR= specifies the formatting characters for your output device. 'formatting-characters' Formatting characters are used to construct the outlines of tables as well as dividers for various procedures, such as the FREQ and TABULATE procedures. If you do not specify formatting characters as an option in the procedure, then the default specifications given in the FORMCHAR= system option are used.

    FORMDLIM= 'delimit-specifies a character that is used to delimit page breaks in SAS character' System output. Normally, the delimit character is null. When the delimit character is null, a new physical page starts whenever a page break occurs.

    LABEL|NOLABEL permits SAS procedures to temporarily replace variable names with descriptive labels. The LABEL system option must be in effect before the LABEL option of any procedure can be used. If NOLABEL is specified, then the LABEL option of a procedure is ignored. The default setting is LABEL.

    REPLACE|NOREPLACE specifies whether permanently stored SAS data sets are replaced. If you specify NOREPLACE, a permanently stored SAS data set cannot be replaced with one that has the same name. This prevents you from inadvertently replacing existing SAS data sets. The default setting is REPLACE.

    SOURCE|NOSOURCE controls whether SAS source statements are written to the SAS log. NOSOURCE specifies not to write SAS source statements to the SAS log. The default setting is SOURCE.

1.4 Editing and Debugging SAS Programs

    ; Include a program by issuing an include: INCLUDE '物理文件名'

    ; The Enhanced Editor中设置和取消书签; Ctrl+F2

    ; To invoke the debugger

    add the DEBUG option to the DATA statement, and execute the program

     data perm.publish / debug;

     infile pubdata;

     input BookID $ Publisher & $22. Year;

     run;

     proc print data=perm.publish;

     run;

    1.5 Creating List Reports

    1. 最简单的;

    proc print data=Sasuser.admit noobs; run;

    

    libname patients 'c:\records\patients';

     proc print data=patients.therapy;

     run;

    2. 选择要显示的列;

    var age height weight fee;

    3. 使用指定列替代Obs列,显示行号的列 Id 列名1 列名2;

    proc print data=sales.reps;

     id idnum lastname;

     run;

    4. 选择行 WHERE where-expression;

    举例;

    proc print data=clinic.admit;

     var age height weight fee;

     where age>30;

    run;

比较标识符;

    Symbol Meaning Example

    = or eq equal to where name='Jones, C.';

    ^= or ne not equal to where temp ne 212;

    > or gt greater than where income>20000;

    < or lt less than where partno lt "BG05";

     >= or ge greater than or equal to where id>='1543';

     <= or le less than or equal to where pulse le 85; where company in ('ACME' , 'RELIABLE') or pctinsured in (80,100);

    5. 进行排序

    语法如下;

    PROC SORT DATA=SAS-data-set <OUT=SAS-data-set>; BY BY-variable(s);

    RUN;

    6. 求和

    proc print data=clinic.insure;

     var name policy balancedue;

     where pctinsured < 100;

     sum balancedue;

     run;

    分组排序后分组求和;

    proc sort data=clinic.admit out=work.activity;

     by actlevel;

     run;

     proc print data=work.activity;

     var age height weight fee;

     where age>30;

     sum fee;

     by actlevel; /*这个by是修饰sum的,所以是按照该字段分组汇总*/

     id actlevel;

     pageby actlevel;

     run;

    In the program that produces the output shown below, which set of statements is used?

    sum dest; Dest Boarded Deplaned by boarded deplaned;

    167 222 LON sum boarded deplaned; by dest;

    sum boarded deplaned; 150 320 by dest;

    id dest; LON 317 542

    sum boarded deplaned; 177 227 PAR id by dest; 177 203

    PAR 354 430

     671 972

正确答案是第三个;

    sum boarded deplaned;

    by dest;

    id dest;

    7. 控制输出的布局

    Double选项,扩大行间距

    8. 增加标题和脚注

    TITLE<n> 'text';

    FOOTNOTE<n> 'text';

    9. 调整显示的字段名

     proc print data=clinic.admit label;

     var age height;

     label age='Age of Patient';

     label height='Height in Inches';

     run;

    10. 数据按一定格式显示

    Format Specifies values ... Example

    COMMAw.d that contain commas and decimal places comma8.2

    DOLLARw.d that contain dollar signs and commas dollar6.2

    MMDDYYw. as date values of the form 09/12/97 (MMDDYY8.) or mmddyy10. 09/12/1997 (MMDDYY10.)

    w. rounded to the nearest integer in w spaces 7.

    w.d rounded to d decimal places in w spaces 8.2

    $w. as character values in w spaces $12.

    DATEw. as date values of the form 16OCT99 (DATE7.) or 16OCT1999 date9. (DATE9.)

    举例;

    This FORMAT statement ... To display values as ...

    06/05/03 format date mmddyy8.;

    1,234 format net comma5.0

     gross comma8.2;

     5,678.90

    $1,234.00 format net gross dollar9.2;

    $5,678.90

proc print data=clinic.admit;

     var actlevel fee;

     where actlevel='HIGH';

     format fee dollar4.;

     run;

    11. 分隔符;

    proc print data=reps split='*';

     var salesrep type unitsold net commission;

    label salesrep='Sales*Representative'; run;

    12. 自定义格式

    proc format;

     value $repfmt

     'TFB'='Bynum'

     'MDC'='Crowley'

     'WKK'='King';

    proc print data=vcrsales;

     var salesrep type unitsold;

     format salesrep $repfmt.;

     run;

    1.6 Creating SAS Data Sets from Raw Data

    1. 通过外部数据得到SAS系统数据的步骤即需要使用的语句; To do this... Use this SAS statement... Example

    libname libref 'SAS-data-Reference SAS data library LIBNAME statement library';

    filename tests Reference external file FILENAME statement 'c:\users\tmill.dat';

    data clinic.stress; Name SAS data set DATA statement

To do this... Use this SAS statement... Example

    infile tests obs=10; Identify external file INFILE statement

    input ID $ 1-4 Age 6-7 Describe data INPUT statement ActLevel $ 9-12 Sex $ 14;

    run; Execute DATA step RUN statement

    proc print List the data PROC PRINT statement data=clinic.stress;

    run; Execute final program step RUN statement

    FILENAME statement可以指向单个文件也可以指向是目录,举例;

    已存在一个filename 定义的fileref

    ; fileref指向单个文件时的调用;

    o infile fuleref obs=10;

    ; fileref指向目录时,调用目录下的文件tax.dat

    o infile fileref(tax.dat)

    o windows环境下;infile tax('refund');

    2. INPUT语句

    INPUT variable <$> startcol-endcol . . . ;

    where

    ; variable is the SAS name that you assign to the field

    ; the dollar sign ($) identifies the variable type as character (if the variable is

    numeric, then nothing appears here)

    ; startcol represents the starting column for this variable

    ; endcol represents the ending column for this variable.

    3. 运算得到新列

    Operator Action Example Priority

     negative=-x; - negative prefix I

     raise=x**y; ** exponentiation I

     mult=x*y; * multiplication II

     divide=x/y; / division II

     sum=x+y; + addition III

     diff=x-y; - subtraction III data sasuser.stress;

     infile tests;

     input ID 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33

     RecHR 35-37 TimeMin 39-40 TimeSec 42-43

     Tolerance $ 45;

     TestDate='01jan2000'd;

     Time='9:25't;

     DateTime='18jan2005:9:27:05'dt;

     TotalTime=(timemin*60)+timesec;

     resthr=resthr+(resthr*.10);

     run; ^ ^

     result original value

    4. 可使用datalines关键字直接在代码中输入数据; data sasuser.stress;

     input ID 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33

     RecHR 35-37 TimeMin 39-40 TimeSec 42-43

     Tolerance $ 45;

     if tolerance='D';

     TotalTime=(timemin*60)+timesec;

     datalines;

     2458 Murray, W 72 185 128 12 38 D

     2462 Almers, C 68 171 133 10 5 I

     2501 Bonaventure, T 78 177 139 11 13 I ;

    5. IF语句 PUT语句

    data clinic.stress;

    infile tests ;

    input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;

    if ID='2810';

    s=ID*10;

    run;

    Test:

    Write a subsetting IF statement that selects observations for subsequent processing only if the value of Salary

    exceeds $25,000

    if salary > 25000;

data work.test;

     infile loan;

     input Code $ 1 Amount 3-10 Rate 12-16

     Account $ 18-25 Months 27-28;

     if code='1' then type='variable';

     else if code='2' then type='fixed';

Report this document

For any questions or suggestions please email
cust-service@docsford.com