Class Note By Examples



Class Note by Examples

General information about SAS

SAS Institute web site:

SAS Online Document:

SAS Environment

The SAS environment mainly has three windows:

Program Editor

Log

Output

In normal case, if your computer has installed PC SAS, it is in Windows environment. If your PC SAS is connected to the Unix SAS by SAS / Connect, then you can also work on the Unix SAS Environment. Figure 2 in next page shows the PC SAS interface.

If your computer has installed SAS Enterprise Guide (SAS / EG), and it has local server connected to your PC SAS, then you are also working in the Windows environment. If your SAS / EG is connected to the Unix server by SAS / Connect, then you can also work in the Unix SAS environment. Figure 3 in next page shows the SAS Enterprise Guide interface.

For this SAS class, we are using SAS OnDemand for Academics. It is on a Unix server. So, we are working on the Unix SAS environment. Figure 1 illustrates the SAS environment.

In most cases, SAS program in Windows environment and in Unix environment is the same, except, the following two minor differences: First, the directory path in Windows environment uses back slash ‘\’, and in Unix environment uses forward slash ‘/’; Second, the directory name and file name in Windows environment is not case sensitive, but in Unix environment, it is case sensitive.

[pic]

Figure 2: PC SAS Interface:

[pic]

Figure 3: SAS Enterprise Guide (SAS / EG) Interface:

[pic]

[pic]

Getting start with the SAS program

* In the following code, the stock values are random numbers;

* They do not reflect past, current or future stock prices;

data stocks;

input ticker $ price industry $;

cards;

ATT 55.25 TECH

LU 48.8 TECH

MSFT 67.87 TECH

PFS 45.9 PHAR

CPQ 28.6 TECH

MRK 72.43 PHAR

AHP 67.29 PHAR

JPM 51.93 FINAN

C 69.72 FINAN

FBF 48.65 FINAN

AOL 38.72 TECH

CSCO 32.64 TECH

PVN 37.4 FINAN

BMS 57.21 PHAR

JNJ 61.23 PHAR

;

run;

proc print data=stocks;

run;

***************************************************************************;

data stocks2;

length ticker $8 price 8 Industry $8;

format price 5.2;

ticker='ATT' ; price=55.25 ; Industry='TECH' ; output;

ticker='LU' ; price=48.8 ; Industry='TECH' ; output;

ticker='MSFT'; price=67.87 ; Industry='TECH' ; output;

ticker='PFS' ; price=45.9 ; Industry='PHAR' ; output;

ticker='CPQ' ; price=28.6 ; Industry='TECH' ; output;

ticker='MRK' ; price=72.43 ; Industry='PHAR' ; output;

ticker='AHP' ; price=67.29 ; Industry='PHAR' ; output;

ticker='JPM' ; price=51.93 ; Industry='FINAN'; output;

ticker='C' ; price=69.72 ; Industry='FINAN'; output;

ticker='FBF' ; price=48.65 ; Industry='FINAN'; output;

ticker='AOL' ; price=38.72 ; Industry='TECH' ; output;

ticker='CSCO'; price=32.64 ; Industry='TECH' ; output;

ticker='PVN' ; price=37.4 ; Industry='FINAN'; output;

ticker='BMS' ; price=57.21 ; Industry='PHAR' ; output;

ticker='JNJ' ; price=61.23 ; Industry='PHAR '; output;

;

run;

proc print data=stocks2;

run;

***************************************************************************;

Libname dd '/courses/ddbf9765ba27fe300';

data work.stocks;

set dd.stocks;

run;

SAS Statement Syntax Review

There are different types of SAS statements.

Only used in data steps

Only used in proc steps

Used in anywhere (System options, libnames, macro statement etc)

General rules

Usually begin with identifying keywords (such as: data, set, length, run, etc.)

Always end with a semicolon (;)

Not case sensitive

Note: In most cases, text in quotes is case-sensitive.

In some operating system, directory path and filename is case sensitive

SAS statements are free format.

They can begin and end in any column. Therefore you can indent the program to make it easy to read.

One statement can continue in several lines.

Blanks can be used to separate words. Special characters also separate words.

Comments can be added in the program.

SAS step boundaries

The SAS step start with the following key words:

11. DATA statement

12. PROC statement

The end of a SAS step is identified by the following key words:

13. RUN statement (for DATA steps and most procedures)

14. DATA statement

15. PROC statement

16. QUIT statement (for some procedures).

Rules for SAS data set and variable names:

Must start with a letter (A to Z) or (a to z) or an underscore ( _ )

Can be uppercase, lower case, or mixed-case.

For version 8&9, it can be 1 to 32 characters in length

Can be mixed with letters, numbers and underscores

No space or tabs in the middle

Variable Value:

Character variable missing value is represented by a blank (‘ ‘).

Character variable’s value is 1- (2**15-1) (32767) characters. The default is 8 characters.

Numeric variable missing value is represented by a dot (.).

Length of numeric variable is 3-8 bytes floating point format. Default is 8bytes. Eight bytes of floating point storage can store number of 16 significant digits.

Sas data libraries

SAS accessing file directory (sas data library) through libref or loosely called libname

A libref references to a directory, not a file. In a directory, there may be a lot of SAS datasets.

SAS access external individual file by fileref, or loosely called filename. External files are not SAS datasets. They are text files, or Excel files, etc. One fileref or filename only reference to a single external file.

[pic]

SAS accessing file directory (sas data library) through libref. A libref is defined by the following statement:

LIBNAME libref ‘SAS-data-library’ ;

Or LIBNAME libref ‘full directory path’ ;

Example: LIBNAME proj1 'c:\project1';

Libname dd '/courses/ddbf9765ba27fe300';

Rules for naming a libref

• Must begin with a letter or underscore

• The remaining characters are letters, numbers, or underscore

• Must be 8 characters or less

The specific sas file (data set) is referred by two-level SAS filenames: Libref.filename

Example: dd.stocks reference the SAS dataset /courses/ddbf9765ba27fe300/stocks.sas7bdat

Note: the two-level SAS filenames (Libref.filename) only reference to SAS data name, not any other type of data, like text files.

Temporary libref “work”:

For every SAS session, the SAS system create a temporal directory (or library), and the libref “work” is automatically assigned by the system to refer to this directory. To refer to the SAS data in the work library, use: work.data_name or data_name. When the libref is omitted, the default libref is work.

The WORK library is automatically deleted when the SAS session is closed. All SAS data in the WORK library will be lost when the SAS session is closed.

Referencing to data files (non-SAS-data):

SAS access external individual file by fileref. A fileref is defined by the following statement:

FILENAME fileref ‘full directory path and filename’ ;

Example: FILENAME stock '/courses/ddbf9765ba27fe300/STOCK.TXT';

Rules for naming a fileref: The rules for fileref is the same as above rules for libref.

Note: On UNIX operating system, the directory and file name is case sensitive.

Styles of Input Data in to SAS Dataset

List Input:

data stocks;

input ticker $ price Industry $;

cards;

ATT 55.25 TECH

LU 48.8 TECH

MSFT 67.87 TECH

PFS 45.9 PHAR

;

run;

Column Input: Note: Key words cards and datalines are inter-changeable.

data stocks;

input ticker $ 1-6 price 10-18 Industry $ 20-28;

datalines;

ATT 55.25 TECH

LU 48.8 TECH

MSFT 67.87 TECH

PFS 45.9 PHAR

;

run;

Formatted input:

data cust;

input Name $ @9 birthday date7. @20 amount comma5.;

format birthday date7.;

cards;

John 12SEP83 2,234

Smith 23JAN92 1,345

Bob 03APR85 4,234

Steve 08AUG88 6,924

;

run;

Reading the External File

* Refer to the file by complete path and filename. ;

data stocks;

infile '/courses/ddbf9765ba27fe300/STOCK.TXT';

input ticker $ price very_long_name $;

run;

* Refer to the file by Fileref. ;

Fileref (mydata in the following example) is 1-8 characters long, start with letterA-Z and can have mixed letter, number and underscore.

filename bb '/courses/ddbf9765ba27fe300/bill.txt';

data bill;

infile bb FIRSTOBS=2;

input fname $1-13 lname $14-25 ssn1 ssn2 ssn3 areacd phonenum $

@68 bal1 dollar8. @77 duedt yymmdd10. @88 billdt date9.;

run;

The advantage of using fileref: ;

When you need to use the same file many times it is more convenient.

You can put all filerefs at the beginning of the program. It is easy to know which files are needed by the program, and easy to modify.

* Read delimited text file ;

filename dw '/courses/ddbf9765ba27fe300/dow_hist_comma.txt';

data dow_history;

informat date date9.;

format date date9.;

infile dw dlm=',';

input date open high low close volume adj_close ;

run;

Reading Raw Data Files in Depth

• List Input

In real world, most of the raw data are in text file format. So, we use FILEREF to refer the raw data.

filename mydata '/courses/ddbf9765ba27fe300/STOCK.TXT';

data stocks;

infile mydata;

input ticker $ price very_long_name $;

run;

* The following is what in stock.txt;

ATT 55.25 TECH

LU 48.8 TECH

MSFT 67.87 TECH

PFS 45.9 PHAR

CPQ 28.6 TECH

MRK 72.43 PHAR

AHP 67.29 PHAR

JPM 51.93 FINAN

C 69.72 FINAN

FBF 48.65 FINAN

AOL 38.72 TECH

CSCO 32.64 TECH

PVN 37.4 FINAN

BMS 57.21 PHAR

JNJ 61.23 PHAR

List input is simple, but there are several restrictions:

1. Field must be separated by at least one blank

2. Each field must be specified in order

3. Numeric missing value should be a period.

4. Character can not have missing value by blank, because it will cause miss-match between variable and value.

5. Character can not have blank in the middle

6. The default length for character variables is 8 bytes, i.e. 8 characters. Longer

value will be truncated.

7. Data must be in standard character or numeric format.

* Column Input;

If the data is in column, then column input will give more advantages.

filename mydata '/courses/ddbf9765ba27fe300/col_input.txt';

data col_input;

infile mydata;

input name $ 1-12 date $ 14-20 amount 23-26;

run;

The following is the data in col_input.txt:

/* Ruler

1 2

123456789012345678901234567890 */

John Dell 12SEP83 2234

Smith Gold 23JAN92 1345

Bob Chen 03APR85 4234

Steve Chang 08AUG88 6924

Michael West 25APR79 3414

Nancy Brown 17JUN85 4938

The advantages of using column input:

• Character variables can be up to 2**15-1 or 32767 characters in length. Not limited to the default length of 8 byres.

• Character variables can contain embedded blanks.

• Field can be read in any order.

• No place-holder is required for missing data.

• Part of the data can be omitted from the input record.

• Field or parts of fields can be reread.

• Formatted Input

Some data, without proper format it can not be read properly.

filename mydata '/courses/ddbf9765ba27fe300/fmt_input.txt';

data fmt_input;

infile mydata;

input name $ 1-12 @14 date date7. @23 amount dollar6.;

run;

* The following is the data in fmt_input.txt;

John Dell 12SEP83 $2,234

Smith Gold 23JAN92 $1,345

Bob Chen 03APR85 $4,234

Steve Chang 08AUG88 $6,924

Michael West 25APR79 $3,414

Nancy Brown 17JUN85 $4,938

The above program can also be written as the following:

filename mydata '/courses/ddbf9765ba27fe300/fmt_input.txt';

data fmt_input;

infile mydata;

input name $ 1-12 +1 date date7. +2 amount dollar6.;

run;

Features of formatted input:

• Formatted input reads data until it has read the number of columns indicated by

informat.

• There are two ways for controlling the position of the pointer.

• Can read data stored in nonstandard form.

• In format can be specified as needed.

Read from the Same Record Twice: Line-hold specifier @

data NYC;

input city $ 18-32 @;

if city='New York';

input FLNO 1-4 AirLine $6-18 city $ 18-32 Time $34-40;

cards;

9238 American New York 10:00

4235 United Philadelphia 5:00pm

798 Delta New York 8:50

4824 North West Houston 12:30pm

1639 South West Chicago 4:15pm

5417 North West New York 11:25

;

run;

Creating Multiple Observations from a Single Record: Double Trailing @

data Risk_Score;

input risk_level $ score @@;

cards;

H 580 L 800 M 680

M 690 L 780 H 620

H 610 M 685 L 795

;

run;

proc print data=Risk_Score;

Title 'Risk Level and Score';

run;

Important INFILE statement options:

LRECL=logical-record-length: Specifies the logical record length. Default is 256.

MISSOVER

Prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing.

Use MISSOVER if the last field(s) may be missing and you want SAS to assign missing values to the corresponding variable.

TRUNCOVER

TRUNCOVER overrides the default behavior of the INPUT statement when an input data record is shorter than the INPUT statement expects. By default, the INPUT statement automatically reads the next input data record. TRUNCOVER enables you to read variable-length records when some records are shorter than the INPUT statement expects. Variables without any values assigned are set to missing.

Use TRUNCOVER to assign the contents of the input buffer to a variable when the field is shorter than expected.

EXPANDTABS

This option is needed when read tab separated file.

Example: LRECL=

* Read text file by line into sas dataset. SAS default record length is 256. To read text file with record length greater then 256, “lrecl” option has to be used;

* Without LRECL= option;

data temp;

infile '/courses/ddbf9765ba27fe300/LRECL400.txt';

input line $ 1-400 ;

run;

options ls=100;

data _null_;

set temp;

if _n_=1 then put line;

run;

* With LRECL= option;

data temp;

infile '/courses/ddbf9765ba27fe300/LRECL800.txt' lrecl=400;

input line $ 1-400 ;

run;

options ls=100;

data _null_;

set temp;

if _n_=1 then put line;

run;

Example: MISSOVER

filename mmm '/courses/ddbf9765ba27fe300/stocks_missover.txt';

* No missover option, only 13 rows read into the sas data.;

data stocks;

infile mmm;

input ticker $ price industry $;

run;

filename mov '/courses/ddbf9765ba27fe300/stocks_missover.txt';

* With missover option, all rows are read into the sas data.;

data stocks;

infile mov missover;

input ticker $ price industry $;

run;

Example: TRUNCOVER

* Read text file into sas dataset without truncover;

data temp;

infile '/courses/ddbf9765ba27fe300/Truncover_effect.txt';

input line $ 1-256 ;

run;

proc print data=temp;

run;

* Read text file into sas dataset with truncover;

data temp;

infile '/courses/ddbf9765ba27fe300/Truncover_effect.txt' truncover;

input line $ 1-256 ;

run;

proc print data=temp; run;

Note: If data is part of the program after the CARDS or DATALINES statement, the following infile statement can be used to use the infile options:

data tt;

infile cards truncover;

input line $1-100;

cards;

Test data:

This is a text data with variable length of fields for a demo.

To show truncover effect.

;

run;

proc print data=tt; run;

Example: EXPANDTABS

*** Without EXPANDTABS option, data is read wrong;

data stock;

infile '/courses/ddbf9765ba27fe300/ExpandTab.txt';

input ticker $ price industry $;

run;

*** With EXPANDTABS option, data is read correctly;

data stock;

infile '/courses/ddbf9765ba27fe300/ExpandTab.txt' EXPANDTABS;

input ticker $ price industry $;

run;

Output to text file:

Example:

libname mydat '/courses/ddbf9765ba27fe300';

filename fmt_out '/courses/ddbf9765ba27fe300/fmt_input.txt';

data _null_;

set mydat.fmt_input;

file fmt_out;

put @1 name

@16 date date9.

@28 amount dollar6.;

run;

Example: Using single @ to hold line.

libname mydat '/courses/ddbf9765ba27fe300';

filename phar '/courses/ddbf9765ba27fe300/phar.txt';

data _null_;

set mydat.stocks2;

file phar;

if industry='PHAR';

put @1 ticker @;

put @10 price dollar8.2 @;

put @20 industry;

run;

More details of using DATA _NULL_ / PUT method to generate report will be discussed later.

Read and Write Existing SAS Dataset

Sas data libraries

SAS accessing file directory (sas data library) through libref. A libref is defined by the following statement:

LIBNAME libref ‘SAS-data-library’ ;

Or LIBNAME libref ‘full directory path’ ;

Example: LIBNAME proj1 ‘c:\project1’;

Rules for naming a libref

• Must begin with a letter or underscore

• The remaining characters are letters, numbers, or underscore

• Must be 8 characters or less

The specific sas file (data set) is referred by two-level SAS filenames:

Libref.filename

Example: (new)

To refer to directory: c:\temp, use the following statement:

Libname cc ‘c:\temp’;

To refer to the sas dataset: /courses/ddbf9765ba27fe300/stockssas7bdat, use:

Libname dd '/courses/ddbf9765ba27fe300';

Libname temp '/home/yihong1/temp';

Data temp.stocks; * output data;

Set dd.stocks; * input data;

Run;

Temporary libref “work”:

For every SAS session, the SAS system create a temporal directory (or library), and the libref “work” is automatically assigned by the system to refer to this directory. To refer to the SAS data in the work library, use: work.data_name or data_name. When the libref is omitted, the default libref is work.

The WORK library is automatically deleted when the SAS session is closed. All SAS data in the WORK library will be lost when the SAS session is closed.

Advanced topic:

1. Usually, you don’t need to know the physical path of the work directory. But if you want to find out where the work directory is, use the following sas code:

options ls=120;

%let work=%sysfunc(getoption(work));

%put &work;

2. If your PC or Laptop is connected to the remote server, you can also assign libname to associate with a remote directory using the remote engine.

Example: libname proj1 remote ‘/users/myid/mydata’ server=servername;

End Advanced topic.

SAS data set name extension:

• For version 8 and above, file extension is: .sas7bdat for both Windows and Unix OPS.

Note: While refer to a SAS data set, the SAS data set extension should be omitted.

Examples:

To read the sas data: /courses/ddbf9765ba27fe300/stocks.sas7bdat into a work data work.stocks:

Libname cdat '/courses/ddbf9765ba27fe300';

data work.stocks; *** work. can be omitted;

*** Cannot use work.stocks.sas7bdat;

set cdat.stocks;

run;

To write the work.stocks to directory /courses/ddbf9765ba27fe300 and name it jk.sas7bdat:

Libname cdat '/courses/ddbf9765ba27fe300';

Data cdat.jk;

Set stocks;

Run;

To read the sas data: /courses/ddbf9765ba27fe300/stocks.sas7bdat and write the data to /home/yihong1/demo/stocks.sas7bdat:

Libname ind '/courses/ddbf9765ba27fe300';

Libname outd '/home/yihong1/demo';

Data outd.stocks;

Set ind.stocks;

Run;

Note: On UNIX system, the directory and filename are case sensitive.

To view all librefs that has been assigned in the sas session:

libname _all_ list;

New features in version 8 above in reading and writing data

• In SAS version 8 and later, SAS data can be accessed by directly using full path and filename without using SAS libref. The following is an example:

data stocks;

set '/courses/ddbf9765ba27fe300 /stocks';

run;

The Advantages of using libref

Though this method is available, it is not widely used. The following is the advantage of using libref.

When many SAS data in the same directory, using short libref is more convenient then using full path.

• All libref in a SAS program can be defined at the beginning of the program. It is easier to read, maintain and modify.

How to brows SAS data

• Using PROC print

Proc print data=cdat.stocks; run; Run this program, the data will be listed in the output window.

If the data is too large, the following program can be used to print only small part of the data:

Proc print data=cdat.stocks (obs=5); run; *** only print the first 5 observations;

Disadvantage: If the data has many variables, the output will be wrap around in the output window, sometimes it can be very messy and difficult to read.

• Using the SAS Explorer

Click on the Explorer tab on the bottom left of the SAS window

Then double click the Libraries icon

Then double click cdat library icon

Then double click stocks data set icon. At this time you will see the SAS data on a VIEWTABLE window with both horizontal and vertical scroll bars if the data is larger then the screen can show.

Note: You have to close this VOEWTABLE window before you can use this data again.

• Using ViewTable command on the top left command window

In the top left command window, type: vt cdat.stocks then

The SAS data will be shown in the VIEWTABLE window the same as using SAS Explore

• Using FSView command on the top left command window

In the top left command window, type: fsv cdat.stocks then

The SAS data will be shown in a FSVIEW window with both horizontal and vertical scroll bars if the data is larger then the screen can show.

SAS Data Set Terminology

SAS Data Set (----( SAS Table

Variable (----( Column

Observation (----( Row

To view the complete information about a SAS data set: Proc contents

Proc contents data=cdat.stocks;

Run;

• There are two parts in the SAS data set. The descriptor portion and the data portion. Proc contents give you the descriptor portion information of the data.

PROC CONTENTS Example:

Libname cdat '/courses/ddbf9765ba27fe300';

PROC CONTENTS DATA=CDAT.STOCKS;

RUN;

-----------------------

Figure 1:

Illustration of SAS Environment

Reading Data

Reading raw data, text file, or flat file

* Text data as part of the program;

Libname out ‘c:\temp’;

data stocks;

input ticker $ price Industry $;

cards; *** or : datalines;

ATT 55.25 TECH

LU 48.8 TECH

MSFT 67.87 TECH

;

run;

* Text data is in a text file;

Libname out ‘c:\temp’;

data out.stocks;

infile 'c:\sas_class\classdata\stock.txt';

input ticker $ price very_long_name $;

run;

Reading SAS Dataset: *.sas7bdat

Libname in ‘c:\temp’;

Libname out ‘c:\temp’;

* Data step;

Data out.outdata;

set in.indata;

Run;

* Proc Step;

Proc contents

data=in.indata;

Run;

Proc SQL;

Select count(*)

From in.indata;

Quit;

List Input Column Input Formatted Input

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download