Introduction to SAS Workshop



Biostat 510: Statistical Computing Packages

SAS Homework 1

Due Tuesday, Jan 18, 2005

Topics:

Importing an Excel file

Creating a permanent SAS data set

Setting up missing values

Frequencies

Histograms

Simple descriptive statistics

1. Make two Excel files containing the data collected on the first day of class.

a. Name the first file HRTRATE1.XLS, and the second file HRTRATE2.XLS.

b. Be sure to use consistent values for character variables such as Gender (F,M), Enter the appropriate missing values codes (such as 999) for missing data.

2. Import the Excel files into SAS, using the Import Wizard, and save them as temporary SAS data sets called HRTRATE1 and HRTRATE2 in the Work library. Save the commands used to import the HRTRATE1 as heartrate.sas. Reopen this file and use it as the basis for your commands for this assignment. Save your command file after making any changes.

a. Check the log to be sure your data sets were imported correctly. If not,

b. Be sure each Excel file is closed before you try to open it in SAS.

c. You can also try re-opening the Excel file and saving it as an Excel 4.0 worksheet. This will require you to modify the commands you use to import the data set to tell SAS that the file is from Excel 4.0.

3. Open the command file (homework1.sas) that you saved from importing the HRTRATE1 data set and use it as the basis for the rest of the commands for the assignment. Copy and paste the commands for importing HRTRATE1.xls and use them as the basis of the commands to import HRRATE2.xls. The commands for importing your original data should be something like those shown below. You may need to modify these commands to specify the location of your Excel file and the name of the sheet, etc.

PROC IMPORT OUT= WORK.HRTRATE1

DATAFILE= "C:\HRTRATE1.XLS"

DBMS=EXCEL REPLACE;

SHEET="HRTRATE1$";

GETNAMES=YES;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES;

SCANTIME=YES;

RUN;

4. Get descriptive statistics for all your numeric variables the SAS data sets HRTRATE1 and HRTRATE2 using Proc Means. Include this output from proc means in your writeup. Visually check the values for all of the variables.

a. How many students are there in this data set?

b. What are the minimum, maximum and mean weight and height (before removing any missing values). Do these values make sense? Please explain.

c. Your commands should something look like those below, but modified for each data set.

proc means data=hrtrate1;

run;

5. Get frequency tabulations for all of the categorical variables in HRTRATE1 and in HRTRATE2 using Proc Freq. Include these frequency tabulations in your output.

a. Are all of these codes consistent? That is, are the codes for the same response the same? If not, fix them in the Excel file and re-import the data.

b. Discuss the frequency distribution of males and females in this data set. Does it appear to be balanced by sex? (Do there appear to be approximately equal percentages of males and females?)

c. Your commands should be something like those below:

proc freq data=hrtrate1;

tables sex activity ;

run;

6. Get a histogram and descriptive statistics for the continuous variables in both of your data sets, using proc univariate. Include the histogram for weight in your homework write-up. Describe the distribution of each of these variables briefly in words. Notice how the missing value codes appear in the histograms.

proc univariate data=b510.hrtrate1 plot;

var distance weight ;

histogram;

run;

7. Create two new permanent SAS data sets called B510.HRTRATE1 and B510.HRTRATE2 using the SET command.

a. Create a folder called b510 on your h: drive if you are working at a campus computing site, or as a folder in c:\temp if you are working at another computer. You can use this folder for your 510 homeworks throughout the semester.

b. Submit a libname statement to define this folder as a library to SAS, using the V9 engine specification.

c. Submit a data step to save HRTRATE1 and HRTRATE2 as permanent SAS data sets, using commands like those below:

libname b510 v9 "c:\temp\b510";

data b510.hrtrate1;

set work.hrtrate1;

run;

8. Modify your commands to assign missing value codes for each variable, where appropriate, and create new variables, as shown below:

a. Be sure to put the commands to modify your data set within the data step. (i.e., between the Data statement and the Run statement).

b. Rerun the entire data step, so the data set b510.hrtrate2 has been created again, with the changes in it. (It will overwrite your original data set).

c. To make any further changes in this data set, just modify your data step and rerun it. Do not make additional data sets.

d. Create the new variables WTKG and HTCM. Weight in Kilograms is equal to weight in pounds divided by 2.2. Height in centimeters is equal to height in inches divided by 0.39.

e. Create a new variable BMI, which is weight in Kg divided by height in Meters-squared. The range of values for BMI should be about 18 to 40.

data b510.hrtrate2;

set hrtrate;

if weight = 999 then weight = .;

if wtunit = "LB" then wtkg = weight/2.2;

if wtunit = "KG" then wtkg = weight;

/**more new variables here***/

run;

f. NOTE: Be sure to close the file if it is open in Browse Mode (Viewtable) or in INSIGHT before re-running your data step.

9. Get information on both of your new data sets

a. Use Proc Means to get descriptive statistics. Please carefully check the minimum, maximum, n and mean for each variable, since the missing values have now been removed. If they still don't make sense, fix your commands, or fix the data in Excel and re-import it.

b. Use Proc Print to print the first 10 observations of each data set to check the data.

c. Use Proc Contents to get the contents of each data set.

d. Get histograms for all continuous variables again, using Proc Univariate, as above.

e. Include the histogram for WTKG in your homework. Include the output from proc means in your homework. Exclude the case for the baby in the output for proc means and proc univariate.

proc means data=b510.hrtrate1;

where wtkg > 10;

run;

proc print data=b510.hrtrate1(obs=10);

run;

proc contents data=b510.hrtrate1;

run;

proc univariate data=b510.hrtrate1;

where wtkg >10;

var wtkg;

histogram;

run;

10. Save your command file on your h: drive as homework1.sas. Make a printout of your commands and hand it in, along with selected output from SAS. Write up Brief answers to each question. Make sure that your commands can run all at once, by simply clicking on the submit button, without selecting any commands.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download