Baoxian Lan, Everest Clinical Research Services Inc., Toronto, Canada ...

Paper PO15

Basic Defensive Programming Techniques Baoxian Lan, Everest Clinical Research Services Inc., Toronto, Canada Daniel Tsui, Everest Clinical Research Services Inc., Toronto, Canada

ABSTRACT

Like defensive driving, defensive programming is to prevent errors and to avoid warning and unexpected results. This paper will demonstrate some basic techniques that we practice in our daily work to prevent error and warning log messages such as: where clause operator requires compatible variables; invalid argument to function xxx; length of class variable xxx truncated to 16, etc.. The paper will also share with you some common techniques to avoid unexpected results. These techniques will be introduced through examples rather than through explanation.

USING MISSING FUNCTION IN WHERE CLAUSE TO PREVENT `WHERE CLAUSE OPERATOR REQUIRES COMPATIBLE VARIABLES' ERROR MESSAGE

`ERROR: Where clause operator requires compatible variables.' is probably one of the most common errors a programmer experiences in his/her daily work. The message appears because the type of the variable used in the WHERE clause and the value specified do not match. To prevent this error, we first need to know the type of the variable and specify the value accordingly. We might want to run a proc contents or check in the table view to figure out the type of the variable. However, if what we want to do is to subset missing or non-missing data in a data step or proc step, then we can ignore the type of the variable by using MISSING function and prevent the error message. See examples below.

149 proc freq data=test;

150

where anvar ne .;

ERROR: Where clause operator requires compatible variables.

151

table anvar*trtgrp/sparse noprint

152

out=_count(keep=anvar trtgrp count);

153 run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK._COUNT may be incomplete. When this step was

stopped there were 0 observations and 0 variables.

WARNING: Data set WORK._COUNT was not replaced because this step was stopped.

NOTE: PROCEDURE FREQ used:

real time

0.04 seconds

cpu time

0.01 seconds

155 proc freq data=test;

156

where not missing(anvar);

157

table anvar*trtgrp/sparse noprint

158

out=_count(keep=anvar trtgrp count);

159 run;

NOTE: There were 100 observations read from the data set WORK.TEST.

WHERE not MISSING(anvar);

NOTE: The data set WORK._COUNT has 4 observations and 3 variables.

NOTE: PROCEDURE FREQ used:

real time

0.16 seconds

cpu time

0.01 seconds

ADDING DOUBLE QUESTION MARKS (??) IN INPUT FUNCTION TO AVOID INVALID ARGUMENT MESSAGE WHEN CONVERTING CHARACTER VALUES TO NUMERIC VALUES.

When using input function to convert a character date to a SAS date, if the character date is missing or a partial date or there is other special character (e.g. NK) in the character date value, there will be a log message saying `NOTE: Invalid argument to function INPUT at line xx column xx.'. For example,

249 data check;

250

set test;

251

252

start=input(startdat, date9.);

253

stop=input(enddat, date9.);

254 run;

NOTE: Invalid argument to function INPUT at line 253 column 9. PT=2201 startdat= enddat=NK patid=2201 visit=15 drugname= start=. stop=. _ERROR_=1 _N_=17 NOTE: Invalid argument to function INPUT at line 253 column 9. PT=2201 startdat= enddat=NK patid=2201 visit=15 drugname= start=. stop=. _ERROR_=1 _N_=18 ... ERROR: Limit set by ERRORS= option reached. Further errors of this type will not be printed. PT=1805 startdat= enddat=NK patid=1805 visit=15 drugname= start=. stop=. _ERROR_=1 _N_=384 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing

values. Each place is given by: (Number of times) at (Line):(Column). 22 at 252:10 346 at 253:9 NOTE: There were 3424 observations read from the data set WORK.TEST.

NOTE: The data set WORK.CHECK has 3424 observations and 8 variables.

NOTE: DATA statement used:

real time

0.23 seconds

cpu time

0.07 seconds

To eliminate such message, we can add two question marks (??) before the date format as shown below.

256 data check;

257

set test;

258

259

start=input(startdat, ?? date9.);

260

stop=input(enddat, ?? date9.);

261 run;

NOTE: There were 3424 observations read from the data set WORK.TEST.

NOTE: The data set WORK.CHECK has 3424 observations and 8 variables.

NOTE: DATA statement used:

real time

0.19 seconds

cpu time

0.01 seconds

USING INPUT TO AVOID CHARACTER VALUES TO NUMERIC VALUES CONVERSION MESSAGE AND PUT FUNCTION TO AVOID NUMERIC VALUES TO CHARACTER VALUES CONVERSION MESSAGE

There are a lot of occasions where we need to convert numeric values to character values or convert character values to numeric values. And there are a lot of ways to do these conversions. However, if we don't converse these values properly, we may get a `numeric values have been converted to character values' or `character values have been converted to numeric values' log message. See two examples below.

34 data _null_;

35

set test;

36

37

call symput('pval', prob);

38 run;

NOTE: Numeric values have been converted to character values at the places given

by: (Line):(Column).

37:24

NOTE: There were 1 observations read from the data set WORK.TEST.

NOTE: DATA statement used:

real time

0.00 seconds

cpu time

0.00 seconds

93 data lab;

94

set lab;

95

by patid labtest visit;

96

97

labvaln=labval*1;

98

99

retain baseline;

100

if first.labtest then baseline=labvaln;

101

102

if visit ne 1 then chg=labvaln - baseline;

103

if labvaln ne . and baseline ne . then pctchg=chg/baseline*100;

104 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

97:12

NOTE: Invalid numeric data, labval='+3;+4' , at line 97 column 12.

patid=1001 visit=6 labtest=UPROQLT labval=+3;+4 FIRST.patid=0 LAST.patid=0 FIRST.labtest=0 LAST.labtest=0 FIRST.visit=1 LAST.visit=1

labvaln=. baseline=2 chg=. pctchg=. _ERROR_=1 _N_=200

NOTE: Invalid numeric data, labval='+3;+4' , at line 97 column 12.

patid=1001 visit=7 labtest=UPROQLT labval=+3;+4 FIRST.patid=0 LAST.patid=0 FIRST.labtest=0 LAST.labtest=1 FIRST.visit=1 LAST.visit=1

labvaln=. baseline=2 chg=. pctchg=. _ERROR_=1 _N_=201

NOTE: Missing values were generated as a result of performing an operation on missing values.

Each place is given by: (Number of times) at (Line):(Column).

3006 at 97:18 3256 at 102:35 3475 at 103:53

NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.

Each place is given by: (Number of times) at (Line):(Column).

506 at 103:53

NOTE: There were 21188 observations read from the data set WORK.LAB.

NOTE: The data set WORK.LAB has 21188 observations and 8 variables.

NOTE: DATA statement used:

real time

0.48 seconds

cpu time

0.14 seconds

These messages may not be allowed in your company's best practice or SOPs, especially the ones associate with those `_ERROR_'s. To avoid such messages showing up in the log, we should always use INPUT function to convert character values to numeric values and PUT function to convert numeric values

to character values. See how the above messages have been removed in the following examples.

44 data _null_;

45

set test;

46

47

call symput('pval', compress(put(prob, pvalue8.4)));

48 run;

NOTE: There were 1 observations read from the data set WORK.TEST.

NOTE: DATA statement used:

real time

0.00 seconds

cpu time

0.00 seconds

190 data lab;

191

set lab;

192

by patid labtest visit;

193

194

labvaln=input(labval, ?? best.);

195

196

retain baseline;

197

if first.labtest then baseline=labvaln;

198

199

if visit ne 1 and baseline ne . and labvaln ne . then chg=labvaln -

baseline;

200

if baseline not in (., 0) and chg ne . then pctchg=chg/baseline*100;

201 run;

NOTE: There were 21188 observations read from the data set WORK.LAB.

NOTE: The data set WORK.LAB has 21188 observations and 8 variables.

NOTE: DATA statement used:

real time

0.41 seconds

cpu time

0.14 seconds

USING LOWCASE / UPCASE FUNCTION TO AVOID CASE SENSITIVE ISSUE IN CHARACTER VALUES

We know that character values in SAS are case sensitive. So if the value in the dataset is in lower case and we spell it in upper case or vise versa, we won't see the expected results or we'll get unexpected results. In the following example, we won't get a p-value because `trtgrp' was not spelled exactly as the value in the dataset.

247 data _null_;

248

set test;

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download