Proc Format Advanced Techniques: Multi-label and Nested ...
嚜燕aper CC10
Proc Format Advanced Techniques: Multi-label and Nested Formats
Tatiana Nevmyrych, PPD, Morrisville, NC
Jennifer Clark, PPD, Morrisville, NC
ABSTRACT
Most of us are aware of basic PROC FORMAT options, like value/invalue to create user-defined formats. But there
are lesser known format options that enhance the flexibility of programs that are often overlooked. The use of a multilabel format in conjunction with the preloadfmt option in PROC MEANS, TABULATE and SUMMARY allows us to
include all possible combinations of formatted class variable values to create group/total counts, as well as to get
counts for every value in a format regardless of if it exists in the data. This is useful for various demographic, or other
types of summaries so frequently utilized in reports in our industry. These format options help to avoid unnecessary
data doubling, and creation of dummy datasets, which saves quite some processing time.
The paper also addresses nested formats (i.e. referring to another existing format when creating new format invalue).
We will illustrate the power of these techniques with examples.
INTRODUCTION
There are always numerous ways to achieve the same result in SAS - some require numerous data/proc steps while
others make your life so much easier if you know the shortcut. Manipulation with formats is one of those shortcuts.
This paper is going to demonstrate how to get counts of categorical data, even if some of the categories do not exist
in the original data (like missing race, gender, subtotals, totals). We will also briefly overview the notion of nested
formats.
MULTI-LABEL FORMATS
Suppose you have some simple demographics data. There are 3 treatment groups in this study, but there are no
patients in the treatment 3 group for this particular subset. Let*s say we need to count patients by gender, race, all 3
treatment groups, and subtotal (treatment 1 and 3) and total (treatments 1, 2 and 3). The frequently used method of
counting subtotals is to double the data as illustrated in Method 1 每 which can be inefficient and time-consuming:
METHOD 1 : DATA DOUBLING
Patient Demographic Data (MYDATA SAS
TM
Dataset)
data addtrt;
SAFE
PT
TRT
SEX
RACE
1
001
1
1
1
1
002
2
2
2
1
003
2
2
3
This code creates a
1
004
2
2
4
1
005
1
1
5
new dataset that is 3
times as large as the
original data.
1
006
2
1
2
1
007
1
2
4
1
008
1
1
5
1
009
2
1
3
1
010
2
1
2
set mydata;
if trt in(1,2,3) then newtrt=trt;
output;
if trt in(1,3) then newtrt=4;
output;
if trt (1,2,3) then newtrt=5;
output;
run;
Obtain
counts
proc means data=addtrt
noprint completetypes nway;
class newtrt;
var safe;
output out=bign
(drop=_type_
_freq_) n=N;
run;
Note that the zero count for treatment
3 is missing. In order to add it to the
output one would have to create a
dummy data set with zero counts and
merge it back with this BIGN output
dataset as illustrated below:
NEWTRT
N
1
4
BIGN
2
6
Output
Dataset
4
4
5
10
data dummy;
do newtrt=1 to 5;
n=0;
output;
end;
run;
data bign;
merge dummy
bign;
by newtrt;
run;
Final BIGN dataset
with all treatments
accounted for in
counts
NEWTRT
N
1
4
2
6
3
0
4
4
5
10
These
are
two
additional steps to
complete
a
simple
task.
Notice that Method 1 required two additional steps to complete a simple task. Fortunately there is a much quicker way
to execute the above task. We will demonstrate it in Method 2, and will also include counts for the sex category as
well.
METHOD 2 : MULTI-LABEL FORMAT WITH PROC MEANS PRELOAD OPTION
Patient Demographic Data (MYDATA SAS
SAFE
PT
TRT
SEX
RACE
1
001
1
1
1
1
002
2
2
2
1
003
2
2
3
1
004
2
2
4
1
005
1
1
5
1
006
2
1
2
1
007
1
1
4
1
008
1
1
5
1
009
2
1
3
1
010
2
1
2
TM
Dataset)
Create Multi-label
Format
proc format;
value
1 =
2 =
3 =
1,3 =
1,2,3 =
trt (multilabel)
'1Treatment 1'
'2Treatment 2'
'3Treatment 3'
'4Subtotal'
'5Total';
value sex
1 = '1Male'
2 = '2Female';
run;
Obtain counts
adding preloadfmt & mlf
options as highlighted
proc means data=dummy
noprint
completetypes nway;
format sex sex. trt trt.;
class trt sex /preloadfmt mlf;
var sex;
output out=bign1
(drop=_type_ _freq_) n=N;
TRT
SEX
1Treatment 1
1Treatment 1
2Treatment 2
2Treatment 2
3Treatment 3
3Treatment 3
4Subtotal
4Subtotal
5Total
5Total
1Male
2Female
1Male
2Female
1Male
2Female
1Male
2Female
1Male
2Female
N
4
0
3
3
0
0
4
0
7
3
BIGN1
Output
Dataset
The preloadfmt option along with mlf generates counts of all values of sex by treatment group defined originally in
proc format even if values of some categories didn*t exist in the original data.
Thus, even though there were no patients in treatment group 3, we still have zero counts for it in the output dataset
because treatment 3 was accounted for in proc format. Likewise we did not have any females in treatment group 1 in
our original data 每 however our final dataset contains a zero count for this category. Hence, there is no need to
create unnecessary dummy datasets with zero counts.
NESTED FORMATS
Starting from SAS V8 nested formats can be used to avoid having multiple copies of codes or look-up values.
In this instance a tumor scan dataset has one record for each patient. The variable RESPONSE identifies the type of
response that was recorded after the scan was done. Meaning of responses can be seen in the response. format.
Values of RESPONSE 1 - 5 are recorded when the scan was actually done, but only 1and 2 qualify for OBJECTIVE
response. We want to create an indicator/count variable for every patient to show whether they had an objective
response and if they had a scan done.
TM
Tumor Scan Data (RESPONSES SAS
PT
Dataset)
proc format;
RESPONSE
001
002
003
004
005
006
007
008
009
010
011
012
1
3
5
2
4
6
3
1
2
4
4
2
value
'1' =
'2' =
'3' =
'4' =
'5' =
other
;
Define Formats
$response
"Complete Response"
"Partial Response"
"Stable Disease"
"Progressive Disease"
"Death"
= "Unknown"
invalue object
'1', '2' = 1
other
= 0
;
invalue scan
'3', '4', '5' = 1
other
= [object.]
;
run;
Apply Formats
data counts;
set responses;
objective = input(response, object.); /* variable to indicate objective
response*/
scan = input(response, scan.); /* variable to indicate if the scan was done*/
Resulting
output
PT
RESPONSE
OBJECTIVE
1
3
5
2
4
6
3
1
2
4
4
2
1
001
002
003
004
005
006
007
008
009
010
011
012
001
SCAN
1
0
0
1
0
0
0
1
1
0
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
The advantage of nested formats in this case, is that it enables code values to be defined once rather than be
repeated in multiple formats. Also note that nesting more than 2 or 3 deep can impact performance - even if you can
keep track of the logic.
CONCLUSION
This paper illustrated just a couple of ways that formats can be used to increase the efficiency, simplicity, and
flexibility of programming. When obtaining counts for categorical data, consider using multi-label formats along with
the preload option in proc means, tabulate or summary to get counts for all categories listed in a defined format 每
including total and subtotal counts. In addition 每 in order to avoid having to repeat values in multiple formats 每
consider using nested formats to save time.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Tatiana Nevmyrych
PPD
3900N Paramount Pkwy
Morrisville, NC
Email: Tatiana.Nevmyrych@rtp.
Jennifer Clark
PPD
3900N Paramount Pkwy
Morrisville, NC
Email: Jennifer.Clark@rtp.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ? indicates USA registration.
Other brand and product names are trademarks of their respective companies.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- pharmasug china 2017 paper 64 how to read rtf files into
- create a format from a sas data set
- 068 2007 creating a format from raw data or a sas data set
- handling sas formats catalogs across versions
- sugi 27 programming tricks for reducing storage and sas
- some useful techniques of proc format
- using proc datasets for efficient sas processing
- proc format advanced techniques multi label and nested
- proc doc iii self generating codebooks using sas
- 316 2013 maintaining formats when exporting data from sas
Related searches
- private label lotions and soaps
- time management techniques and strateg
- data analysis techniques and methodology
- tqm tools and techniques pdf
- car sales techniques and strategies
- private label shampoo and conditioner
- draw and label water cycle
- abstract painting techniques and ideas
- planning tools and techniques pdf
- food label games and worksheets
- sas proc format library
- why do people stereotype and label others