Introduction to SAS



Introduction to SAS

1. General

The Statistical Analysis System (SAS) is a comprehensive set of facilities for data management, reporting, and user interface design. It contains tools for wide range of applications from the simplest report to the most complex statistical analysis. The SAS system, which consists of several modules, is available in its entirety on the Bloomsburg University computer network. In the WINDOWS operating system, press START and select PROGRAMS. You will see SAS as one of the choices. Select this option and you will see the SAS windows: program window, output window and the log window. The program window is the SAS editor for writing and editing programs. The output window is to see the output of your program after it is run, and the log window is to trace the program.

2. Structure of a SAS Program

A SAS program consists of a series of steps. Each step can be described by statements using the SAS language. There are generally two types of steps: DATA steps and PROC steps. A DATA step contains statements for inputting, modifying and transforming data. You can also output the data into one or more data files in a data step. A PROC step uses one or more standard SAS procedures to carry out a specific task on the input data, e.g. hypothesis testing.

3. Running a SAS Program

Once you have created a SAS program either by a text editor or interactively, press the RUN key and your program will be submitted for processing.

4. Basic Syntax Rules

The SAS data step consists of a series of statements. Rules for writing these statements follow.

Words:

• Words in statements must be separated by one or more blanks.

• A word may not be split between lines.

• Words may be in upper, lower, or mixed case.

Variable names:

• Variable names may be one through eight characters in length.

• All variable names must begin with an alphabetic character (A-Z, a-z) or an underscore (_). Subsequent characters may include digits.

• A variable list such as Vl-V5 means V1, V2, V3, V4, and V5.

• SAS matches variable names precisely character-wise, but not case-wise. That is V1 is not the same as V01, but V1 is the same as v1.

• Variable names may not contain embedded blanks. V1 and V_1 are acceptable; V 1 is not.

• Certain names are reserved for use by SAS, e.g., _N_, TYPE, and _NAME_. Similarly, logical operators such as GE, LT, AND, OR, and EQ should not be used as variable names.

Statements

• A statement may begin anywhere on a line and may be continued on additional lines as necessary.

• Statements end with a semicolon (;).

• Statements which begin with an asterisk (*) are treated as comments and are not interpreted. A comment is concluded with a semicolon.

• A group of statements preceded by /* are ignored until */ is read (block comment). Semicolons between /*…*/ have no effect.

• Multiple statements may appear on a line; they must be separated by semicolons.

5. The Data Step

• The data step begins with the word DATA followed by a name for the temporary or permanent data set to be output by the data step. See the sample programs which create and use temporary SAS data sets.

• The data step includes instructions about where to find the data and how to read the values from the data file.

• The data step may contain instructions to create new variables or transform existing variables, label variables, and select cases or variable. The following statements are examples of valid statements for the SAS data step:

y = sum (of x1-x15);

label y = ‘total score’;

if y > 10 then group = 1;

else group = 2;

keep group y;

• To refer to a missing value for a numeric variable, use a ”.”. for example, the statement: if a = 99 then a = .; forces SAS to treat a value of 99 as if it were missing.

• All data step commands must be contained within the data step itself; additional data step commands may be inserted after a PROC only after beginning a new data step and reading in the default data set.

6. SAS PROCS

• SAS PROCs (procedures) are used for many purposes including carrying out statistical analysis (e.g., PROC REG, PROC MEANS), displaying information about a SAS data set (e.g., PROC CONTENTS, PROC PRINT), and creating graphs (PROC PLOT).

• Most PROCs produce output of some kind. The output of statistical PROCs usually appears in the listing file.

• The PROC(s) must appear after a data step which creates the SAS data set used in the procedure.

• The word PROC automatically terminates a SAS data step.

• Data step commands may not appear after a PROC unless a new data step is initiated with the word DATA.

• A SAS PROC begins with the word PROC followed by the name of the specific procedure (e.g., PROC REG).

• Some PROCs have options or subcommands which allow the user to output information into a SAS data set (e.g., PROC UNIVARIATE, PROC REG).

• The default data set used by a PROC is the data set created by the last data or PROC before the current PROC. To change the data set used by a PROC, use the DATA = option on the PROC line.

7. Miscellaneous Commands

• The OPTIONS statement allows the programmer to set options for the

current sessions. For example: OPTIONS NOCENTER LINESIZE=80; sets the line size in the listing file as 80 columns in length and shifts the output to the left side of the page.

• INFILE is used to access a specific file. An example of a INFILE

statement appears in Example 3.

8. Sample Program 1

In this example the SAS program reads data organized in fixed columns, from an inline source and uses two PROCs.

Program

DATA CLASS:

INPUT NAME $ 1-8 SEX $ 10 AGE 12-13 HEIGHT 15-16 WEIGHT 18-22;

CARDS;

JOHN M 12 59 99.5

JAMES M 12 57 83.0

ALFRED M 14 69 112.5

ALICE F 13 56 84.0

;

PROC MEANS;

VAR AGE HEIGHT WEIGHT;

PROC PLOT;

PLOT WEIGHT*HEIGHT;

RUN;

Explanation

The Data Step

• The DATA statement tells the computer that the data is coming from an

inline source, SAS creates a temporary data file called WORK.CLASS.

• The INPUT statement formats the variable for the computer

1. NAME: this is an alphanumeric variable, as indicated by the $. The variable NAME has been assigned columns 1-8.

2. SEX: this is also an alphanumeric variable, and has been assigned column 10.

3. AGE: this is a numeric variable, and has been assigned columsn 12-13.

4. HEIGHT: numeric variable, columns 15-16.

5. WEIGHT: numeric variable, columns 18-22.

• The CARDS statement informs the computer that the data are located

in the next lines.

The PROCS

1. The first procedure, PROC MEANS, calculates the mean for every variable.

2. The second, PROC PLOT, plots the values for WEIGHT against HEIGHT.

9. Sample Program 2

In this example, we set up a 2x2 tables for bronchitis and level of organic particulates and age groups:

DATA BRONCHITIS;

INPUT AGEGRP LEVEL $ BRONCH $ N;

CARDS;

1 H Y 20

1 H N 382

1 L Y 9

1 L N 214

2 H Y 10

2 H N 172

2 L Y 7

2 L N 120

3 H Y 12

3 H N 327

3 L Y 6

3 L N 183

;

proc freq data =bronchitis order=data;

tables agegrp*level level*bronch agegrp*level*bronch;

weight n;

run;

10. Sample Program 2

In this example, the program reads data organized in columns separated by spaces, from an external file and uses three PROCs.

Program

DATA auto;

INFILE ‘a:\car.data’;

*The data set CAR.DAT can be retrieved from the data link on this web site;

* y = cost, x1 = price, x2 = miles;

input id y x1 x2;

options pagesize=35 linesize=75;

Proc univariate plot normal;

var x1 x2;

Proc means min max;

var x1 x2;

Proc chart;

vbar x1; vbar x2;

hbar y;

run;

Explanation

The Data Step

1. INFILE reads a data set, specified as car.dat.

2. The variable (id, x1, x2) are read in as “lit input,” because the data (all numerical values, in this example) are stored in the file separated by spaces.

The PROCS

1. PROC MEANS: computes the default summary statistics of all the variables x1 and x2.

2. PROC UNIVARIATE: computes more detailed summary statistics for the metric variables x1 and x2. The options PLOT and NORMAL produce specific summaries.

3. PROC CHART: produces a vertical bar chart for the variables x1 and x2.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download