Homework #4 PH5420



PH6420 Fall 2017 : Assignment 3

Overview

Write one program for parts A/B and one program for part C. When creating new variables make sure that you account for possible missing data in the original variable.

Submit the 2 SAS programs answering all questions. You may include the output if it helps you answer the questions, but it is not required.

PART A:

1. Create a SAS dataset that reads in the variables age, sex, income, educ, hdlbl, potassbl, and potass12 from tomhs.dat.

2. Within the data step create a new variable called agecat that divides age into 5 categories: 45-49; 50-54; 55-59; 60-64; 65-69. Gives values of 1-5 for the categories.

3. Create a new variable called income40 equal to 0 if income is less than $40,000 and equal to 1 if income is $40,000 or more. Create a new variable called collgrad equal to 0 if the participant did not graduate from college and equal to 1 if the participant graduated from college. (see study forms for income and education categories).

4. Create a new variable that is the smallest of the two serum potassium values (variables potassbl and potass12).

5. Run PROC MEANS for all variables to help verify that you read-in and created the new variables correctly (For example, income and income40 should have the same number of valid observations). What percentage of participants graduated from college? What percentage of participants have incomes of 40K or more.

6. Using PROC FREQ display the 2 by 2 table of income40 and collgrad. Are college graduates more likely than non-graduates to make $40,000 or above? Give summary statistics to justify your answer (You may add the CHISQ option to the table statement if you want to statistically test if the two variables are related).

7. Display the frequency distribution of the smallest potassium. What percentage of participants had a serum potassium less than 3.5.

8. Most studies show that women have lower rates of heart disease then men. One mechanism for this lower rate is that women have higher HDL cholesterol than men. Run a SAS procedure that addresses this question using the TOMHS data. Do women tend to have higher HDL than men?

PART B:

1. Add to your program in Part A syntax to create suitable formats (use PROC FORMAT) for sex, income and agecat. (this can be put at the top of your program or after the data step).

2. Add suitable labels for each variable read in and each new variable. This can be put in the data step.

3. Use PROC FREQ to obtain a frequency distribution of sex, income and agecat. Apply the formats created so that the formatted values are displayed rather than the numeric values. What percentage of subjects are 65 years or older?

PART C:

The following observations are made-up data from four hospital stays. The data is comma delimited with the variables: Patient ID, date of birth of patient, date of hospital admission, date of hospital discharge, and total cost of stay in dollars.

001,10/21/1946,12/12/2004,12/14/2004,8000

002,05/01/1980,07/08/2004,08/08/2004,12000

003,01/01/1960,01/01/2004,01/04/2004,9000

004,06/23/1998,11/11/2004,12/25/2004,15123

.

1. Create a SAS dataset that reads in the data. Call the variables id, dob, admit, dischrg, and fee. Read in id as a character variable. You may type the data within the program or read it in from the data file hosp.csv on the class website. Note: To input the date variables you will need to use colon modifiers on the input statement (See Lecture 2).

2. Compute a variable called staydays which is the number of days spent in the hospital..

3. Create a variable called age that is the age of the patient in years at time of admission.

4. Create a variable called costperday that is the average cost per day.

5. Use PROC PRINT to display all the variables. Provide date formats for each date variable.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download