Lab Objectives - Stanford University



Lab One: SAS Orientation

Also: 2x2 Tables, PROC FREQ, Odds Ratios, Risk Ratios

Lab Objectives

After today’s lab you should be able to:

1. Load SAS program.

2. Move between the EDITOR, LOG, and OUTPUT windows, and understand their different functions.

3. Understand SAS libraries. Understand SAS temporary library (the “WORK” library).

4. Use the Explorer Browser in SAS.

5. Understand how to write comments in SAS.

6. Understand the basic structure of a SAS program and SAS code.

7. Understand the difference between SAS datasteps and SAS procedures.

8. Use SAS as a calculator.

9. Know some SAS logical and mathematical operators.

10. Assign a library name (libname statement and point-and-click).

11. Input grouped data directly into SAS.

12. Use PROC FREQ to output contingency tables.

13. Use PROC FREQ to calculate chi-square statistics and odds ratios and risk ratios.

14. Understand the concept of a SAS macro (just a function).

15. If time, create a simple SAS macro to calculate the confidence intervals for an odds ratio.

LAB EXERCISE STEPS:

Follow along with the computer in front

1. Open SAS: From the desktop( double-click “Applications”( double-click SAS icon

2. There are 3 windows in SAS: the editor, output, and log windows.

a. You enter SAS code into the editor (the enhanced editor screen alerts you to potential errors through its coloring scheme). You run SAS programs that appear in the editor by clicking on the running man icon in your toolbar.

b. After a program runs, the output appears in the output screen.

c. The execution of a program is logged in the log screen, as are errors.*

You can open the editor, output, or log windows by selecting them in the “VIEW” menu at the top of your screen.

3. SAS programs are composed of data steps and procedures (abbreviated as PROCs). Data-steps deal with importing, entering, and manipulating data. Procedures deal with analyzing data (making numerical or graphical summaries and running specific statistical tests). We will first work with SAS datasteps:

Type the following data step in the editor window:

data example1;

x=18*10**-6;

run;

Explanation of code:

data example1;

x=18*10**-6;

run;

4. Select (highlight) the code (using your mouse), and click on the running man icon.

5. Use the Explorer Browser on the left hand side of your screen to locate and view the dataset “example1” in the work library (file cabinet icons represent data libraries).

a. Double click on the libraries icon (looks like a filing cabinet).

b. Double click on the work library icon (looks like one drawer in a filing cabinet).

c. Double click on the dataset “example1” to open it in viewtable mode. The dataset should contain a single value.

d. Click on the “up one level” icon (folder with an up-arrow on the toolbar) to return to the library icons.

6. Type the following code in the editor window, and run the program (select the code and click on running man).

data _null_;

x=18*10**-6;

put x;

run;

7. Check what has been entered into the log. Should look like:

5 data _null_;

6 x=18*10**-6;

7 put x;

8 run;

0.000018

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

8. Using your Explorer Browser, observe that no new datasets have been added to the work library.

9. Type the following code in the editor window and run the program.

data _null_; *use SAS as calculator;

x=LOG(EXP(-.5));

put x;

run;

SAS LOG should contain:

9 data _null_; *use SAS as calculator;

10 x=LOG(EXP(-.5));

11 put x;

12 run;

-0.5

10. Use SAS to calculate the probability that corresponds to the probability of getting X=25 from a binomial distribution with N=100 and p=0.5 (for example, what’s the probability of getting 25 heads EXACTLY in 100 coin tosses?):

data _null_;

p= pdf('binomial', 25,.5, 100);

put p;

run;

11. Use SAS to calculate the probability that corresponds to the probability of getting an X of 25 or more from a binomial distribution with N=100 and p=.5 (e.g., 25 or more heads in 100 coin tosses):

data _null_;

pval= 1-cdf('binomial', 24, .5, 100);

put pval;

run;

12. Libraries are references to places on your hard drive where datasets are stored. Datasets that you create in permanent libraries are saved in the folder to which the library refers. Datasets put in the WORK library disappear when you quit SAS (they are not saved).

13. Libraries are temporary references to places on your hard drive where datasets are stored. You can assign a library name through the libname statement (step 14) or through point-and-click features, as follows:

a. Click on “new library” icon (slamming file cabinet on the toolbar).

b. Browse to find the extension to the Desktop. COPY THIS EXTENSION USING CONTROL C.

c. Name the library hrp261.

d. Hit OK to exit and save.

14. Whenever you open SAS anew you will need to rename the library. If you have saved code to do this, it will save you a step. Type the following code in the editor (and run) to assign the folder Desktop the library name “hrp261”. USE CONTROL V to paste the extension (may differ on different computers).

libname hrp261 ‘C:\Documents and Settings\mitl-pc.LANE-LIB\Desktop’;

15. Type the following code in the editor to copy the dataset example1 into the hrp261 library (rename it “hrp261.example1”):

data hrp261.example1;

set example1;

x2=x**2;

drop x;

run;

16. Find the dataset in the hrp261 library using the Explorer Browser.

17. Browse to find the example1 dataset in the Desktop folder on your hard drive. This dataset will remain intact after you exit SAS.

18. Next, we will input data from a 2x2 table directly into a SAS dataset. In the SAS editor screen, input the following data set. These are grouped data from the atherosclerosis and depression example (from the Rotterdam study) in lecture 1:

data Rotterdam;

input IsDepressed HasBlockage Freq;

datalines;

1 1 28

1 0 53

0 1 511

0 0 1328

run;

/*Use PROC PRINT to view the data*/

proc print data=Rotterdam;

run;

19. Verify that the data have been printed to your output screen as below:

Is Has

Obs Depressed Blockage Freq

1 1 1 28

2 1 0 53

3 0 1 511

4 0 0 1328

20. Generate the 2x2 contingency table using PROC FREQ.

proc freq data=Rotterdam order=data;

tables IsDepressed*HasBlockage /nopercent norow nocol;

weight freq;

run;

RESULTS:

Table of IsDepressed by HasBlockage

IsDepressed

HasBlockage

Frequency‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 28 ‚ 53 ‚ 81

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 511 ‚ 1328 ‚ 1839

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 539 1381 1920

21. Request statistics for contingency tables using PROC FREQ.

proc freq data=Rotterdam order=data;

tables IsDepressed*HasBlockage / chisq measures expected;

weight freq;

run;

RESULTS:

Table of IsDepressed by HasBlockage

IsDepressed

HasBlockage

Frequency‚

Expected ‚

Percent ‚

Row Pct ‚

Col Pct ‚ 0‚ 1‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 1328 ‚ 511 ‚ 1839

‚ 1322.7 ‚ 516.26 ‚

‚ 69.17 ‚ 26.61 ‚ 95.78

‚ 72.21 ‚ 27.79 ‚

‚ 96.16 ‚ 94.81 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 53 ‚ 28 ‚ 81

‚ 58.261 ‚ 22.739 ‚

‚ 2.76 ‚ 1.46 ‚ 4.22

‚ 65.43 ‚ 34.57 ‚

‚ 3.84 ‚ 5.19 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 1381 539 1920

71.93 28.07 100.00

Statistic DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chi-Square 1 1.7668 0.1838

Likelihood Ratio Chi-Square 1 1.6976 0.1926

Continuity Adj. Chi-Square 1 1.4469 0.2290

Mantel-Haenszel Chi-Square 1 1.7659 0.1839

Phi Coefficient 0.0303

Contingency Coefficient 0.0303

Cramer's V 0.0303

Fisher's Exact Test

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Cell (1,1) Frequency (F) 1328

Left-sided Pr = F 0.1157

Table Probability (P) 0.0407

Two-sided Pr = or ge, |+ addition |

| |- subtraction |

|INT(v)-returns the integer value (truncates) |SIGN(v)-returns the sign of the argument or 0 |

|ROUND(v)-rounds a value to the nearest round-off unit |SQRT(v)-calculates the square root |

|TRUNC(v)-truncates a numeric value to a specified length |EXP(v)-raises e (2.71828) to a specified power |

|ABS(v)-returns the absolute value |LOG(v)-calculates the natural logarithm (base e) |

|MOD(v)-calculates the remainder |LOG10(v)-calculates the common logarithm |

APPENDIX B: Some useful probability functions in SAS

Normal Distribution

➢ Cumulative distribution function of standard normal:

P(X≤Z)=probnorm(Z)

➢ Z value that corresponds to a given area of a standard normal (probit function):

Z= ((area)=probit(area)

➢ To generate random Z ( normal(seed)

Exponential

➢ Density function of exponential (():

P(X=k) = pdf('exponential', k, ()

➢ Cumulative distribution function of exponential (():

P(X≤k)= cdf('exponential', k, ()

➢ To generate random X (where (=1)( ranexp(seed)

Uniform

P(X=k) = pdf('uniform', k)

P(X≤k) = cdf('uniform', k)

To generate random X ( ranuni(seed)

Binomial

P(X=k) = pdf('binomial', k, p, N)

P(X≤k) = cdf('binomial', k, p, N)

To generate random X ( ranbin(seed, N, p)

Poisson

P(X=k) = pdf('poisson', k, ()

P(X≤k) = cdf('poisson', k, ()

-----------------------

This is a SAS data step.†桔⁥楦獲⁴楬敮琍汥獬匠十琠牣慥整愠搠瑡獡瑥挠污敬⁤斓慸灭敬⸱ₔ吠楨⁳慤慴敳⁴楷汬戠⁥汰捡摥椠瑮桴⁥瞓牯鑫氠扩慲祲‬桷捩⁨獩琠敨搠晥畡瑬琠浥潰慲祲氠扩慲祲‮ഠ匍浡⁥獡愠潢敶戠瑵琠敨錠湟汵彬ₔ整汬⁳䅓⁓潴渠瑯戠瑯敨⁲潴洠歡⁥⁡慤慴敳⁴攨朮Ⱞ椠⁦潹⁵番瑳眠湡⁴潴甠敳匠十愠⁳⁡慣捬汵瑡牯⸩഍潎整琠慨⁴慥档挠浯慭摮椠⁡䅓⁓牰杯慲畭瑳戠⁥異据畴瑡摥眠瑩⁨⁡敳業挭汯湯‮䴠獩汰捡摥漠⁲業獳湩⁧敳業挭汯湯⁳慣獵⁥慭祮攠牲牯⁳湡⁤畭档映畲瑳慲楴湯椠䅓ⱓ猠慰⁹瑡整瑮潩 The first line

tells SAS to create a dataset called “example1.” This dataset will be placed into the “work” library, which is the default temporary library.

Same as above but the “_null_” tells SAS to not bother to make a dataset (e.g., if you just want to use SAS as a calculator).

Note that each command in a SAS program must be punctuated with a semi-colon. Misplaced or missing semi-colons cause many errors and much frustration in SAS, so pay attention to their placement!

Assigns a value to the variable x.

Variable name goes to the left of the equals sign; value or expression goes to the right of the equals sign.

Note that each data step or proc in SAS ends with a run statement. The program is not actually executed, however, until you click on the running man icon.

Tells SAS to print the value of x in the SAS log.

Adds a new variable x-squared to the dataset.

Drops the variable x ; “keep x2;” would have same result.

Starts with the dataset work.example1

Makes a new dataset called example1 in the hrp261 library.

Code for moving a dataset, part of a dataset, or a dataset with modifications into a new library.

Name the library

Note use of informative variable names.

| | | |

| | | |

| |Depressed |Not |

|Atherosclerosis |28 |511 |

|None |53 |1328 |

Use SAS as a calculator. See Appendix for more mathematical and logical operators.

Don’t forget the semi-colon!

Location of the folder where the datasets are physically located.

[pic]

[pic]

Comments (ignored by SAS but critical for programmers and users) are bracketed by /* and */ and should appear green in the editor.

Comments (ignored by SAS but critical for programmers and users) may be bracketed by * and ;

Or by /* and */

Options (optional features) follow a front slash in a SAS procedure.

These options tell SAS to present the chi-square statistic as well as measures of association (odds ratios and risk ratios).

Asks SAS to present the expected table for the chi-square test.

See Appendix for more probability functions.

This is your first example of a SAS procedure.

The print procedure simply prints data in the output screen.

Column1 risk ratio=[pic]

Probability of having atherosclerosis if you are not depressed.

Column2 risk ratio=[pic]

Probability of also having atherosclerosis if you are depressed:

Chi-square is non-significant.

Probability of NOT having atherosclerosis if you are NOT depressed.

Expected counts are highlighted here.

Tells SAS how to ORDER the rows and columns. The default is to use numerical or alphabetical order, which would make cell a the “undepressed, unblocked” cell. Instead, order=data tells SAS to order rows and columns according to the order that the values appear in the dataset (1s before 0s).

Probability of NOT having atherosclerosis if you are depressed:

Fisher’s exact is automatically calculated when you request chi-square statistics for a 2x2 table.

No Atheroscl.

PREVIEW: We will later learn the use of PROC FORMAT to change 0’s and 1’s to meaningful labels.

depressed

If you forget the weight statement, SAS will see only 1 observation in each cell of your 2x2 table.

The variable “freq” stores the counts in each 2x2 cell.

Not depressed

Atheroscl.

The probit function returns the Z score associated with a given area under a normal curve.

When creating a macro, it’s important to include detailed comments that instruct a new user on how to use your macro.

Options (optional features) follow a front slash in a SAS procedure.

These options tell SAS to omit the cell, row, and column percents in the 2x2 table.

| | | |

| |Has outcome |No |

| | |outcome |

|Exposed |a |b |

|Unexposed |c |d |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download