We have discussed how to fit and interpret linear models



Part 4 – Creating variables STA 506

Outline

4.1 Set statements

4.2 Functions of numeric variables

4.3 Logical statements

4.4 Logical operators

4.5 Drop and keep statements

4.6 More on working with categorical data

4.7 An example based on techniques learned so far

4.1 Set statements

- We will now create a second data set, patientdata2, which is a copy of the first data set, patientdata.

- The set statement reads in an existing working data set into a new data set.

data patientdata2;

set patdir.patientdata;

run;

- Obviously there is no reason to perform the above for its own sake.

- Between the set and run statements we will add some SAS statements which will create new variables in this data set which are functions of some of the original variables.

- It is generally good practice to not manipulate the data within the original data set.

4.2 Functions of numeric variables

- Generally, data is manipulated within a SAS data step.

- Inside the data step, the values of variables may be modified or new variables may be created by mathematical or logical operations.

Assignment statements

- The within-row operations to be discussed now will be of the form,

[pic]

- We will create new variables as a function an existing variable or variables using assignment statements.

- The simplest use of an assignment statement is to create a copy of a variable.

- To make a copy of the race variable only requires the statement

race_also = race;

- The variable race_also is an exact copy of race, without any format and/or label which was associated with race at the time the above statement is used.

Basic mathematical operations

- Often new variables created with data steps are the result of some basic mathematical operations.

- Many of the mathematical operators are obvious.

|Mathematical operation |SAS operators |

|Addition |+ |

|Subtraction |- |

|Division |/ |

|Multiplication |* |

|Exponentiation |** |

- Below are some simple examples of the sort of operations which might be done with the database we are currently working with:

sum_bp = pre_bp + post_bp;

average_bp = sum_bp/2;

diff_bp = pre_bp - post_bp;

- Assignment statements are often used to convert the units of a particular variable. Below we convert the patient height from inches to meters, and the patient weight from pounds to kilograms.

height_m = height/39.37;

weight_kg = weight*0.4536;

- As mentioned above exponentiation is indicated by two asterisks. If we wanted to create a variable which is the square of height (in meters) the appropriate SAS statement is

sqr_height_m = height_m**2;

- The variable bmi, which is a measure of body fat, is equal to weight (kg) divided by the square of height (meters).

- This variable may be created with the statement

bmi = weight_kg/sqr_height_m;

- The above would only be valid if the statements corresponding to the creation of weight_kg and sqr_height_m preceeded the above.

- To aid in our understanding, below is a listing of these newly created variables:

s

q

a r

v w _

e h e h

p r d e i e

p o s a i h w i g i

r s u g f e e g h g

e t m e f i i h t h

O _ _ _ _ _ g g t _ t b

b b b b b b h h _ k _ m

s p p p p p t t m g m i

1 145 120 265 132.5 25 72 165 1.82880 74.844 3.34452 22.3781

2 165 132 297 148.5 33 64 123 1.62560 55.793 2.64259 21.1130

3 162 145 307 153.5 17 78 160 1.98120 72.576 3.92517 18.4899

4 180 165 345 172.5 15 73 240 1.85420 108.864 3.43807 31.6643

5 155 134 289 144.5 21 71 255 1.80340 115.668 3.25226 35.5654

6 151 143 294 147.0 8 63 180 1.60020 81.648 2.56065 31.8857

7 172 141 313 156.5 31 . 175 . 79.380 . .

8 149 129 278 139.0 20 75 210 1.90500 95.256 3.62904 26.2483

9 166 130 296 148.0 36 63 162 1.60020 73.483 2.56065 28.6971

10 165 139 304 152.0 26 65 151 1.65100 68.494 2.72581 25.1278

- A series of mathematical operations may be done on the same line by making use of parentheses.

average_bp = (pre_bp + post_bp)/2;

bmi = weight_kg/(height_m**2);

- Alternatively, bmi may of been created with the follow single statement:

bmi = (weight*0.4536)/((height/39.37)**2);

- Note that mathematical operations on missing values result in a missing value for that specific observation.

- Since the height is missing for patient A08, any of the above operations involving height would result in a missing value for the new variable.

- As was seen in some of the statements above, the arguments of these operations may also be constants.

- It can be seen that the average bmi is 26.8. The difference between each patients’ bmi and the group average can be found with the following statement.

bmi_ave_diff = bmi - 26.8;

- One may overwrite a variable as well. If one felt the original weight variable would not be needed after the conversion to kilograms one could use the statement

weight = weight*0.4536;

- This is not generally recommended and should not be done unless one is absolutely sure the original values will never be needed again.

Built-in numeric functions

- There are many mathematical functions built into SAS which are extremely useful in the statistical analysis of data.

- Some of the popular functions are summarized in the table below.

|Function |Purpose |

|log(arg) |Returns the natural log of the argument|

|log10(arg) |Returns the log base 10 of the argument|

|sqrt(arg) |Returns the square root of the argument|

|abs(arg) |Returns the absolute value of the |

| |argument |

|ceil(arg) |Returns the next largest integer of the|

| |argument |

|floor(arg) |Returns the next smallest integer of |

| |the argument |

- The argument for each of the functions may be a constant or, the usual case, a variable.

- As an example, the following statement creates a new variable called log_bmi which contains the natural log of the values contained in the variable bmi.

log_bmi = log(bmi);

- With all the functions above, if the sole argument is missing, the result will be missing.

- SAS has functions which taken not only constants or variables, but also some function specific parameter.

- As an example, the round function takes an argument, and the round-off-unit one wants to round the argument to: round(arg, round-off-unit).

- If one wanted to round height to the nearest foot we would use the statement

nearest_foot = round(height,12)/12;

- There also exist many functions in SAS which take a series of variables (or constants) as arguments.

|Function |Purpose |

|mean(arg1, arg2,…) |Returns the mean of the arguments given |

|min(arg1, arg2,…) |Returns the minimum of the arguments given |

|max(arg1, arg2,…) |Returns the maximum of the arguments given |

|std(arg1, arg2,…) |Returns the standard deviation of the arguments |

| |given |

|var(arg1, arg2,…) |Returns the variance of the arguments given |

|sum(arg1, arg2,…) |Returns the sum of the arguments given |

- When using the above functions, SAS will compute the requested descriptive statistics using the non-missing values.

- That is, missing values will be ignored.

- Note that one may accomplish the same task multiple ways, but one must be careful in the presence of missing values.

- It can be seen that

average_bp = (pre_bp + post_bp)/2;

and

average_bp = mean(pre_bp, post_bp);

are equivalent statement if there are no missing values.

- There is a difference when missing values are in the data.

- If the observation in pre_bp was missing, the first statement would set average_bp to missing.

- The second statement would set average_bp to the average of the non-missing values, which would be post_bp itself.

4.3 Logical statements

If-then statements

- Data manipulations often involve making a decision or calculation based on the value of a variable with respect to a particular value or the value of another variable.

- Typically, this is done with the if-then statement.

- The general form of this type of statement is

If then

- If the condition is true then the action is executed.

- As used below, if-then statements are useful to correct data entry mistakes.

if id=’D55’ then age=71;

- When SAS reads the above in a given program, it will observe if the condition (id=’D55’) is met, and if so, it will set that particular patient’s age to 71.

- If the condition is not met, as would be the case with all but one of the patients, the action is not executed.

- Note that since id is a categorical variable, specific values must be surrounded by quotes.

- SAS is not case sensitive with the exception of the values of categorical variables.

- If id was a numeric variable the quotes would not be appropriate.

Comparison operators

- If the condition in an if-then statement is in regards to the order or equality of some values or variables, a comparison operator may be used.

- The table below contains the comparison operators available in SAS.

|Comparison operations |SAS operator |Alternative SAS operator |

|Equal |= |EQ |

|Greater than |> |GT |

|Less than |< |LT |

|Greater than or equal to |>= |GE |

|Let than or equal to |= 30 then obese = ‘Y’;

- If the condition is met the patient will have a ‘Y’ value in the obese variable, and a missing value otherwise.

- Missing values for character variables are signified by a space in SAS.

- Using the follow code, we may observe if the new blood pressure medication improved the patients’ readings by more than 15 units.

if (pre_bp - post_bp) > 15 then improved = 1;

- To obtain a printout of the “improved” patients, we use proc print with the additional where statement.

proc print data=patientdata2;

where improved = 1;

run;

- One weakness in using if-then statements in creating the variables obese and improved is that those observations which do not meet the condition are assigned missing values.

- We now see how to assign a different value to those variables when the conditions are not met.

If-then-else statements

- The if-then-else statement has the following basic structure to it.

If then

else

- It is just like the if-then statement except an action is execute when the condition is not met as well.

- The else subcommand issues instructions if the condition in the if-then statement is false.

- To assign a zero, instead of a missing value, to the improved variable if blood pressure did not decrease by more than 15 units we may use the code

if (pre_bp - post_bp) > 15 then improved = 1;

else improved = 0;

- We may use similar code for the obese variable,

if bmi >= 30 then obese = ‘Y’;

else obese = ‘N’;

- The variable obese is set to ‘Y’ when the patient’s BMI is greater than or equal to 30, and is set to ‘N’ under all other conditions.

- There is a problem with the above code due to the presence of missing values.

- The above appears as if it would do the job, but when the condition is evaluated for the patient with the missing BMI, the condition will not be met and the patient will have obese assigned ‘N’.

- This is incorrect since we do not know if the person is obese or not.

- We may solve the problem with an if-then-else-if statement which has the structure,

If then

else if then

- The appropriate code to perform the task is below.

if bmi >= 30 then obese = ‘Y’;

else if 0 < bmi < 30 then obese = ‘N’;

- One might question why the second condition was not simply bmi < 0. This is because SAS stores missing numeric values as negative infinite and so the individual with the missing BMI would still have obese set to ‘N’.

- The if-then-else-if structure may continue on for any number of conditions and actions:

If then

else if then

[pic]

else if then

- Here we use the above type of statement to create a grouping variable based on age:

if 0 < age ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches