MERGE Statement - CoAS



MERGE Statement

Merging combines observations from two or more SAS data sets into one SAS data set horizontally.

Syntax: MERGE SAS-data-set-list;

BY variable-list;

One-to-One Merging: If the by variable is not specified, two or more SAS data sets merge to one SAS data set by observation numbers.

Example:

libname cdat '/courses/ddbf9765ba27fe300';

data left_dat;

input week $8.;

cards;

Week1

Week2

Week3

Week4

Week5

;

run;

data right_dat;

input topic $8.;

cards;

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

;

run;

data all;

merge left_dat right_dat;

run;

proc print data=all;

run;

The output data set is printed as following:

Obs week topic

1 Week1 Topic 1

2 Week2 Topic 2

3 Week3 Topic 3

4 Week4 Topic 4

5 Week5 Topic 5

Data sets with the same variables

When two or more SAS data sets have the same variables, the values of the last SAS data set in the merge statement will over write the early data set, and get into the output SAS data set. If over writing is not what you want, use RENAME data step option to rename one of the input data sets variable.

Example:

data left_dat;

input ID $3. balance 4.;

cards;

001 102

005 89

002 231

004 147

003 192

;

run;

data right_dat;

input Name $ 1-15 @17 balance 4.;

cards;

John Smith 96

Ted Husion 80

Martha Chen 150

Sandy Lee 100

Paul Leny 192

;

run;

data all;

merge left_dat right_dat;

run;

proc print data=all;

run;

The following is the output of data set all:

Obs ID balance Name

1 001 96 John Smith

2 002 80 Ted Husion

3 003 150 Martha Chen

4 004 100 Sandy Lee

5 005 192 Paul Leny

Match-Merging:

Merging with a by statement allows match observations according to the values of the by variables. Before the merging, all input data sets must be sorted by the BY variables or KEY variables.

Example:

data left_dat;

input ID $3. balance 4. zip 6.;

cards;

001 102 16431

005 89 46298

002 231 98704

004 147 42316

003 192 44765

007 479 21496

;

run;

data right_dat;

input Name $ 1-15 @17 balance 4. @23 ID $3.;

cards;

Sandy Lee 100 004

Paul Leny 192 003

John Smith 96 001

Ted Husion 80 005

Martha Chen 150 002

Jason Tod 244 006

;

run;

proc sort data=left_dat;

by id;

run;

proc sort data=right_dat;

by id;

run;

data all;

merge left_dat right_dat;

by id;

run;

proc print data=all; run;

The following is the output of data set all:

Obs ID balance zip Name

1 001 96 1643 John Smith

2 002 150 9870 Martha Chen

3 003 192 4476 Paul Leny

4 004 100 4231 Sandy Lee

5 005 80 4629 Ted Husion

6 006 244 . Jason Tod

7 007 479 2149

IN= option :

In the above example, the observation number 6 and 7 are contributed from one of the two datasets. The IN= option create a variable that can identify whether the data set has contribution to the output.

Example: In the above example, we add another variable ‘source’, and use IN= option to identify the contribution from each input dataset:

data all;

length source $8;

merge left_dat(in=in1) right_dat(in=in2);

by id;

if in1 and in2 then source='Both';

else if in1 then source='Left';

else source='Right';

run;

proc print data=all; run;

The Following is the output of data all:

Obs source ID balance zip Name

1 Both 001 96 1643 John Smith

2 Both 002 150 9870 Martha Chen

3 Both 003 192 4476 Paul Leny

4 Both 004 100 4231 Sandy Lee

5 Both 005 80 4629 Ted Husion

6 Right 006 244 . Jason Tod

7 Left 007 479 2149

Example: In the above example, if we want the output data set contains only the observations that have contribution from both input data sets:

data all;

merge left_dat(in=in1) right_dat(in=in2);

by id;

if in1 and in2 ;

run;

ExampleIf we want the output data set contains all observations that have contribution from right_dat input data set:

data all;

merge left_dat(in=in1) right_dat(in=in2);

by id;

if in2 ;

run;

Note: If the purpose of the match merging is one-to-one match merging, the input data set should have no duplicate keys. Therefore, before the merging, the NODUPKEY option might need to be used for proc sort before the merging.

Match-Merging for one-to-many or many-to-one

proc sort data=cdat.sort_data2 out=sort_data;

by type order;

run;

data sum_data(keep=type order sum_bal);

set sort_data;

by type order;

if first.order then sum_bal=bal;

else sum_bal + bal;

if last.order then output;

run;

data final;

merge sort_data sum_data;

by type order;

run;

Important note for one-to-many or many-to-one Merge: If the MANY dataset has the same variable name with the ONE dataset, it has to be renamed or dropped before merge. Otherwise, only the first observation of each by group will be merged in.

Example:

data sort_data2;

set sort_data;

by type order;

if first.type or first.order then sum_bal=bal;

else sum_bal+bal;

run;

data final2;

merge sort_data2 sum_data;

by type order;

run;

Compare with proc SQL method

libname cdat '/courses/ddbf9765ba27fe300';

proc sql;

select industry, ticker, price, sum(price) as sumprice,

mean(price) as meanprice, std(price) as stdprice,

price/sum(price) as pctprice

from cdat.stocks2

group by industry

having mean(price)>50

;

quit;

Note: Proc SQL usually is more convenient in calculation of percentage, group percentage etc. than data steps, such as combined data step and merge, or combined proc summary (or means) and merge.

Data Step Interface

The data step performs analysis within the data step. At the end of the data step, all information will be saved in the output data sets. How do we bring some of information to the outside of the data step, or the next data step without carrying the whole column of the data set?

CALL SYMPUT routine: The SYMPUT routine assigns a value produced in the data step to a macro variable. If the macro variable does not exist, it will be created. If the macro variable does exist, it will be over written. In most cases macro variables created by SYMPUT routine is GLOBAL, i.e. it is available for the whole SAS session.

SYNTAX: CALL SYMPUT(mvname, value);

Note: The SYMPUT routine assigns the value of the macro variable during data setp execution, but the macro variable references resolve during the compilation of a step or global statement used outside a step.Therefore:

• Macro variable generated by the SYMPUT routine cannot be referenced in the same data step.

• To reference a value in a global statement following the data step (for example, a TITLE or FOOTNOTE statement), an explicit RUN statement is required.

To use the macro variable, use macro reference &. For example: a=&mvname;

Example:

data stocks;

set cdat.stocks2 end=last;

if ticker='AOL' then call symput('aol_price',price);

if last then call symput('nobs', _n_); * Try trim(left(_n_)) here;

run;

%put &aol_price &nobs;7

Title "Stock price for &nobs companies";

proc print data=stocks;

run;

data check;

ticker='AOL';

price=&aol_price;

put 'AOL price is ' price;

put "The price for AOL is &aol_price";

put "The stocks data set has &nobs observations";

run;

What’s happened without RUN:

data stocks;

set cdat.stocks2 (obs=5) end=last;

if ticker='AOL' then call symput('aol_price',price);

if last then call symput('nobs', _n_); * ty trim(left(_n_)) here;

* run;

Title "Stock price for &nobs companies";

proc print data=stocks;

run;

In the above example, we commented out the RUN statement in the data step. Also, we added a data step option (obs=5). The title is: Stock price for the 5 companies.

Why It is not working?

data stocks;

set cdat.stocks2;

if ticker='AOL';

call symput('aol_price',price);

new_price=&aol_price; * using the macro variable in the same data step is wrong;

put 'The new price is ' new_price;

run;

Running the above code in a new SAS session, you will get ERROR message for the line:

new_price=&aol_price; Because at compile time, the compiler does not recognize &aol_price as an macro variable. If run the above code in an existing SAS session and the macro variable &aol_price is already exist in the session, change the value for price, you will see that the first run after the change the price is still the old value. Start from the second run, the price is the new value.

Example for using call symput: An efficient way to get number of observations in a SAS data set:

DATA _NULL_;

SET cdat.stocks2 NOBS=obsnumber;

CALL SYMPUT('numobs',trim(left(obsnumber)));

STOP;

RUN;

%put &numobs;

%put Data set stocks has &numobs observations;

Note: %PUT statement put global macro variable’s value into SAS LOG.

Example: Calculate the percent of total:

data random(drop=i);

do i=1 to 10; * Preparation: Create a random number data;

balance=ceil(ranuni(1)*10);

output;

end;

run;

data _null_; * Calculate the total and create a macro variable;

set random end=eof;

tot+balance;

if eof then call symput('total',tot);

run;

data cal_pct; * Calculate percent;

set random;

pct=balance/&total;

format pct percent8.2; * Note: percent6.2 is not working, so make it longer to 8.2;

run;

proc print data=cal_pct;

run;

CALL SYMPUTX() Routine: Assigns a value to a macro variable and removes both leading and trailing blanks.

data _null_;

pi=3.1416;

call symput('pi', pi);

call symputx('pi2', pi);

run;

%put pi=π %put; %put; %put pi2=&pi2;

Directing Output To External Files

FILE statement: FILE statement specifies the current output file for put statement.

Example: The following program use the FILE statement direct the output of the put statement to an external text file.

libname cdat '/courses/ddbf9765ba27fe300';

data _null_;

set cdat.stocks;

file '/home/yihong1/temp/stocks.txt'; * Output go to this text file;

put @1 ticker 5. -R

@8 price dollar8.2 -R; * -R make the output right aliened;

run;

Example: The above can also be written as the following way.

filename outfile '/home/yihong1/temp/stocks.txt';

data _null_;

set cdat.stocks;

file outfile; * Output go to this text file;

put @1 ticker 5. -R

@8 price dollar8.2 -R; * -R make the output right aliened;

run;

Details of file statement will be given in data _null_ / PUT reporting session later.

PROC PRINTTO procedure: The PRINTTO procedure defines destinations for SAS procedure output and for the SAS LOG.

Example: Direct print output to external file.

filename out1 'c:\sas_class\temp\date_time.txt'; * Specify output file;

proc printto print=out1 new; * Direct output to file out1. Without new option, append;

run;

proc print data=cdat.date_time; run;

proc printto; * Reset proc printto to default, usually it is output windows;

run;

Example: Direct both print output and log to external files.

filename out1 '/home/yihong1/temp/date_time.txt';

filename log1 '/home/yihong1/temp/date_time.log';

libname cdat '/courses/ddbf9765ba27fe300';

proc printto print=out1 log=log1 new;

run;

proc print data=cdat.date_time;

run;

proc printto;

run;

Miscellaneous

X command: To execute a single host system operating system command.

Syntax: X ‘operating-system-command’;

Example;

x 'cd /home/yihong1/temp';

x 'mkdir jk';

x 'rmdir jk';

The above command requires user manually close the host system window. To automate the process, use NOXWAIT system option:

Options noxwait;

x 'cd /home/yihong1/temp';

x 'mkdir jk';

x 'rmdir jk';

Example: To start an Excel spread sheet:

x 'cd c:\sas_class\classdata';

x 'stock.xls';

Run Excel macro from SAS:

options noxwait noxsync;

x 'cd C:\sas_class\classdata';

x 'dow_history.xls';

filename EXCEL DDE 'EXCEL|SYSTEM';

data _null_;

file excel;

put '[RUN("macro2")]';

run;

To interrupt a running SAS process: use -sas myprog.sas then batch run.

> sas myprog.sas & then batch background run.

> nohup sas myprog.sas & then batch background run and do not hang up.

Frequently Used SAS Procedures

The FREQ procedure

The FREQ procedure produces one-way to n-way frequency crosstabulation tables.

Syntax:

PROC FREQ ;

BY variables;

TABLES requests ;

WEIGHT variable;

Run;

Example: Without TABLES statement, all variables frequency will be calculated without crosstabulation.

proc freq data=cdat.sort_data2;

run;

Example: Two one way frequency tables will be generated.

*options date number;

options nodate nonumber;

proc freq data=cdat.sort_data2;

tables type order;

Title 'One way frequency for two variables';

run;

Example: A two way frequency tables will be generated.

options nodate nonumber;

proc freq data=cdat.sort_data2;

tables type * order;

Title 'Two way crosstabulation';

run;

The rules of request for table statement are as following:

Tables a*(b c) ; ( tables a*b a*c;

Tables (a b)*(c d); ( tables a*c b*c a*d b*d;

Tables (a b c)*d; ( tables a*d b*d c*d;

Tables (a—c); ( tables a b c;

Tables (a—c)*d; ( tables a*d b*d c*d

Tables _numeric_; ( one way freq for all numeric variables.

Tables _Character_; ( one way freq for all character variables.

Example:

options nodate nonumber;

proc freq data=cdat.sort_data2;

tables type * order /list;

Title 'Two way crosstabulation with list option';

run;

Note: List option prints two-way to n-way tables in a list format rather than as crosstabulation tables.

Example:

proc freq data=cdat.stocks2;

table industry / missing;

Title 'One way frequency table with missing option';

run;

Note: Missing option interprets missing values as non-missing and includes them in calculations of percentage and other statistics.

Example: Direct output to a SAS data set

options nodate nonumber;

proc freq data=cdat.sort_data2;

tables type * order / out=freq_out;

Title 'Two way crosstabulation with out= option';

run;

ODS Basics:

To find out the object name for the output sas dataset, add ods trace on before the proc procedure, then find the name from the log.

ods trace on;

proc freq data=cdat.sort_data2;

tables type * order;

Title 'Two way crosstabulation with out= option';

run;

The following is the sas log output:

Output Added:

-------------

Name: CrossTabFreqs

Label: Cross-Tabular Freq Table

Template: Base.Freq.CrossTabFreqs

Path: Freq.Table1.CrossTabFreqs

-------------

NOTE: There were 36 observations read from the data set CDAT.SORT_DATA2.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.03 seconds

cpu time 0.01 seconds

From the log, we can see that the name for Cross-Tabular Freq Tables is: CrossTabFreqs in the line begins with: Name:

Then, to create the sas output dataset, replace ods trace on with the following statement: ods output CrossTabFreqs=cross_freq;

Where the output sas dataset name is cross_freq. You can change this name with any legal sas dataset name.

ods output CrossTabFreqs=cross_freq;

proc freq data=cdat.sort_data2;

tables type * order;

Title 'Two way crosstabulation with out= option';

run;

Note: PROC FREQ has many other options and can perform many statistical analysis.

For detail, see: SAS Procedure Guide

The MEANS Procedure

The MEANS procedure produces simple univariate descriptive statistics for numeric variables.

PROC MEANS ;

VAR variable-list;

CLASS variable-list;

FREQ variable;

WEIGHT variable;

ID variable-list;

BY variable-list;

OUTPUT

;

RUN;

Option-list:

The following is a few frequently used options:

DATA=SAS-data-set

MISSING: Treat missings as valid subgroup values for the CLASS variables.

MAXDEC=: Maximum number of decimal places (0 to 8) in printing results.

NOPRINT: Suppresses printing of all the descriptive statistics. Used when the

Only purpose is to create a SAS data set.

NWAY: Specify that statistics be output for only the observation with the highest

_TYPE_ value (highest level of interaction among CLASS variables).

VAR statement: List variables that statistics want to be calculated. If a VAR statement is not used, all numeric variables in the input data set, except for those listed in BY, CLASS, FREQ, ID, or WEIGHT statements, are analyzed.

CLASS statement: Segmentation variable list. Can be numeric or character, each has a small number of discrete values. Class variables do not need to be sorted. Can be used with BY variables together, each has a separate variable list.

BY statement: BY group variable list. It is similar to the CLASS variables. The main difference is the output format.

FREQ statement: Each observation in the input data set is assumed to represent m observations in the calculation of statistics, where n is the value of the FREQ variable.

ID Statement: Not used often. Self study as needed.

OUTPUT statement: Request output statistics to a new SAS data set.

OUT=SAS-data-set; Example: OUT=cdat.mean_balance;

Output-statistic-list: list of statistics to calculate and the corresponding output

variable names. Example:

mean=mean_bal std=std_bal sum=sum_bal

The available statistics including: N NMISS MIN MAX RANGE

SUM SUMWGT MEAN CSS USS VAR STD STDERR CV

SKEWNESS KURTOSIS T PRT

Example 1: Simple means using class.

libname cdat '/courses/ddbf9765ba27fe300';

proc means data=cdat.balance;

class type ;

var bal;

run;

The SAS System

The MEANS Procedure

Analysis Variable : bal

N

type Obs N Mean Std Dev Minimum Maximum

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

A 120 120 497.4250000 279.0705059 15.0000000 996.0000000

B 80 80 468.0875000 292.2642810 8.0000000 980.0000000

C 200 200 535.1750000 284.5463716 4.0000000 996.0000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Example 2: Simple means with by.

proc sort data=cdat.balance out=balance;

by type;

run;

proc means data=balance;

by type ;

var bal;

run;

The SAS System

---------------------------------- type=A -----------------------------------

The MEANS Procedure

Analysis Variable : bal

N Mean Std Dev Minimum Maximum

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

120 497.4250000 279.0705059 15.0000000 996.0000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

---------------------------------- type=B -----------------------------------

Analysis Variable : bal

N Mean Std Dev Minimum Maximum

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

80 468.0875000 292.2642810 8.0000000 980.0000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

---------------------------------- type=C -----------------------------------

Analysis Variable : bal

N Mean Std Dev Minimum Maximum

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

200 535.1750000 284.5463716 4.0000000 996.0000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Example 3: With noprint option and output send to a SAS data set.

proc means data=cdat.type_missing noprint;

class type ;

var bal sale;

output out=type_out;

run;

Note: NOPRINT will cause the result not print to the output window.

Example 4: With missing option and output send to a SAS data set

proc means data=cdat.type_missing noprint missing;

class type ;

var bal sale;

output out=type_out_missing;

run;

Note: Missing option will cause proc means to treat missing values as valid subgroup values for the class variables.

Example: With NWAY option.

proc means data=cdat.type_missing noprint nway;

class type ;

var bal sale;

output out=type_out;

run;

Note: NWAY option cause the statistics to be output for only the observation with the highest type value.

Example: Specify statistics key words.

proc means data=cdat.type_missing noprint nway;

class type ;

var bal sale;

output out=type_out sum=;

run;

Note: When output variable name is not specified after sum=, variable names in the var statement will be the output variable names, but the value is summed value.

Example: Specify statistics key words an output variable names.

proc means data=cdat.type_missing noprint nway;

class type ;

var bal sale;

output out=type_out sum=bal_sum sale_sum mean=bal_mean sale_mean;

run;

Note: To specify multiple statistics, output variable names must be specified. Otherwise,

Only one statistics can be specified. The following example will get ERROR message:

proc means data=cdat.type_missing noprint nway;

class type ;

var bal sale;

output out=type_out sum= mean=;

run;

WARNING: Variable bal already exists on file WORK.TYPE_OUT.

WARNING: Variable sale already exists on file WORK.TYPE_OUT.

WARNING: The duplicate variables will not be included in the output data set

of the output statement number 1.

List of Available Statistic Keywords

N NMISS MIN MAX RANGE SUM

SUMWGT MEAN USS CSS VAR STD

STDERR CV SKEWNESS KURTOSIS T PRT

An example with weight variable by Jay Wang:

The SUMMARY Procedure

The PROC SUMMARY computes descriptive statistics on numeric variables in a SAS data set and outputs the results to a new SAS data set. The procedure is similar to the MEANS procedure. The main difference is that MEANS procedure by default prints the results to the output window, while the SUMMARY procedure does not produce printed output.

SUMMARY statement:

PROC SUMMARY ;

VAR variable-list;

CLASS variable-list;

FREQ variable;

WEIGHT variable;

ID variable-list;

By variable-list;

Output

;

Example: Without noprint option, but the output goes to the output SAS data set, not

output window.

proc summary data=cdat.type_missing nway;

class type order;

var bal sale;

output out=type_out sum=bal_sum sale_sum mean=bal_mean sale_mean;

run;

For other examples, see the EXAMPLES in PROC FREQ section.

The UNIVARIATE Procedure

The UNIVARIATE procedure produces simple descriptive statistics for numeric variables. The main difference of this procedure with others is that it provide a greater details on the distribution of a variable. The PROC UNIVARIATE can provide the following:

• Details on the extreme values of a variable

• Quantiles, such as the median

• Frequency tables

• Several plots to illustrate the distribution

• Paired comparison tests

• Tests of central location

• A test to determine whether the data are normally distributed.

PROC UNIVARIATE ;

VAR variable-list;

By variable-list;

FREQ variable;

WEIGHT variable;

ID variable-list;

Output

;

Example: UNIVARIATE with FREQ, PLOT, and NORMAL options and output data.

proc UNIVARIATE data=cdat.type_missing freq plot normal;

* by order;

var bal;

output out=univ mean=bal_mean median=bal_median pctlpts=25 75 pctlpre=a b;

run;

Example: Use Proc Univariate to create histogram plot:

data normal_dist;

do i=1 to 1000;

rand = 100 * rannor(0);

output;

end;

run;

proc univariate data=normal_dist;

var rand;

histogram;

run;

The RANK Procedure

The RANK procedure computes ranks for one or more numeric variables of a SAS data set. The ranks are output to a new SAS data set.

Specifications:

PROC RANK ;

BY variable-list;

RANKS new-variable-list;

VAR variable-list;

Example: Group variable BAL into approximately 5 equal groups by rank.

proc rank data=cdat.type_missing out=rank_out groups=10;

var bal;

ranks rankbal;

run;

proc sort data=rank_out;

by rankbal;

run;

proc freq data=rank_out;

tables rankbal;

run;

Example: Group variable BAL into approximately 10 equal groups for each type by rank.

proc sort data=cdat.type_missing out=type_missing;

by type;

run;

proc rank data=type_missing out=rank_out groups=10;

by type;

var bal;

ranks rankbal;

run;

proc sort data=rank_out;

by type rankbal;

run;

proc freq data=rank_out;

tables type*rankbal; * / missing;

run;

The TRANSPOSE Procedure

The TRANSPOSE procedure transposes the SAS data set from observations (row) to variables (column) or vice versa.

Specifications:

PROC TRANSPOSE ;

VAR variable-list;

ID variable;

IDLABEL variable;

COPY variable-list;

BY variable-list;

VAR statement: List variables to transpose. If omitted, all numeric variables will be transposed. If not omitted, only those variables listed in the VAR statement will be transposed.

ID statement: Specifies a variable in the input data set. PROC TRANSPOSE uses the formatted values of the ID variable as the name if the transposed variables in the output data set.

• Each formatted ID value should occur only once in the input data set, or if BY statement is used, only once in each BY group.

• If the value of the ID variable is not a legal SAS variable name, the following change will be made by the SAS system:

“+” ( “P”, “-“(“N” “.” ( “D”

Leading Numeric number ( “_”

IDLABEL statement: If ID statement is used, you can use IDLABEL statement to specify a label for the transposed variables in the output data set.

COPY statement: Copies all variable values in the COPY statement directly from input data set to the output data set without transposing them.

BY Statement: Specifies BY variable list. The BY variables have to be sorted.

Example: Transpose by default, only numeric variable is transposed.

proc transpose data=cdat.stocks2 out=stock_tran;

run;

Example: Transpose with specified variable names.

proc transpose data=cdat.stocks2 out=stock_tran;

var ticker industry price;

run;

Example: Transpose with ID specified. Variable ticker’s value becomes variable names in the output.

proc transpose data=cdat.stocks2 out=stock_tran;

var industry price;

id ticker;

run;

Example: A typical application. After the proc means or proc summary, transpose the statistics from columns to rows by type, so that for each type, the statistics are in the same row.

proc means data=cdat.balance;

class type ;

var bal;

output out=balance;

run;

proc transpose data=balance out=h_bal;

by type;

var bal;

id _stat_;

run;

proc sort data=cdat.balance out=balance;

by type;

run;

proc sort data=h_bal;

by type;

run;

data final;

merge balance h_bal;

by type;

run;

The COMPARE Procedure

The compare procedure compares the two SAS data sets, or values of different variables within a single data set.

Specifications:

PROC COMPARE ;

VAR variable-list;

WITH variable-list;

ID variavle-1 variable-n

;

BY variable-list;

VAR: List variables to be compared. If omitted, values of all matching variables except those appearing in BY and ID statements will be compared.

WITH: If the variables you are trying to compare have different names in base and compare data sets, specify variable name of base data with VAR statement, and specify variable name of compare data using WITH statement. The first variable in WITH statement will be compared with the first variable in VAR statement, and the Second variable in WITH statement will be compared with the second variable in VAR statement, and so on.

ID: Lists variables to use to match observations in the base data set with corresponding observations in the comparison data set.

BY: When used, both the base and comparison data sets must be sorted by the BY variables.

Example: Before merge the two datasets, compare the variable attributes first.

proc contents data=cdat.dset1 out=dset1_ct;

proc contents data=cdat.dset2 out=dset2_ct;

run;

proc compare base=dset1_ct

compare=dset2_ct listall;

var name type length format label;

id name;

run;

Example: Compare the variables that hav different name in the two data sets.

proc compare base=cdat.stocks

compare=cdat.stocks2;

where price ne .;

var Very_Long_Variable_Name;

with industry;

run;

Sort first, then compare again:

proc sort data=cdat.stocks out=temp1;

by ticker price;

run;

proc sort data=cdat.stocks2 out=temp2;

by ticker price;

run;

proc compare base=temp1

compare=temp2;

where price ne .;

var Very_Long_Variable_Name;

with industry;

run;

Note: PROC COMPARE is often used for data validation. For example, to compare the result SAS data sets that created by two different methods, or when a new method of analysis is developed, you might compare the result data set with the one that created with the old method to make sure the results are consistent.

proc compare data=inlib.MSMADAT compare=inlib.MSMADAT_csv method=percent criterion=0.0001;

run;

The PRINT Procedure

The PRINT procedure prints a listing of the values of all or some of the variables in a SAS data set. This procedure can be use for simple customized reports.

Specifications:

PROC PRINT

VAR variable-list;

ID variable-list;

BY variable-list;

PAGEBY BY-variable;

SUM BY BY-variable;

SUM variable-list;

Option-List:

DATA= SAS-data-set: Name of SAS data set to print. If omitted, the last SAS data set created will be printed.

DOUBLE (D), Double-space the printed output.

Label (L), Instead of using variable name as column headers, use variable label as column headers. Use SPLIT= to control the split of label.

N, Print the number of observations at the end of print out. If BY statement is used, number of observations in each by group will be printed.

NOOBS, Suppresses the observation number in the output.

ROUND (R), round variable values according to the format statement. If no format specified, round to two decimal places.

SPLIT=’split-character’ or S=’split-character’, specifies character to use for splitting label as column header.

BY Statement: When BY statement is used, the PRINT procedure prints a separate analysis for each BY group. The BY variables is expected to be sorted.

If the input data set is not sorted, you can use the NOTSORTED or DESCENDING option.

ID Statement: When used, the PRINT procedure will use the formatted values of the ID variables to identify observations instead of using the observation numbers.

PAGEBY Statement: Cause the PRINT procedure prints a new page whenever the value of the PAGEBY variable changes or whenever the value of any BY variable listed before it in the BY statement changes.

• PAGEBY variable must be one of the BY variables.

• Example: by a b c; PAGEBY b; In this example, whenever a or b changes, a new page will be started.

SUM Statement: Specify variables whose values are to be summed. If BY statement is used, each BY group will be summed.

SUMBY Statement: Prints subtotals for the BY variable whenever its value changes or whenever the value of any BY variable listed before it in the BY statement changes.

• SUMEBY variable is a numeric variable appearing in the BY statement.

VAR Statement: List variable names to be print.

Example:

options nodate nonumber linesize=80;

proc sort data=cdat.balance_sale2 out=balance_sale2 ;

by type order;

run;

proc print data=balance_sale2 label split='*' noobs; * SPLIT and NOOBS;

var type order sale bal; * Order of the variables also controlled here;

label bal='Month End*Balance'; * Over write the default split, space;

by type order;

pageby order;

Title 'List of Sale and Balance'; * Can be outside and before the PROC;

Title2 'By Type and Order';

Footnote 'As of March 22, 2003';

run;

• Global options can be used to customize the output

NODATE: Do not print the date on top of each page.

DATE: Print the date on top of each page.

NONUMBER: Do not print page number on each page.

NUMBER: Print page number on each page.

NOCENTER: Do not center the output.

CENTER: Center the output.

LINSIZE: Set the line size in number of characters. Range 64-256.

• TITLE and FOOTNOTE can be inside the PROC PRINT, or they can be out side of the PROC PRINT (before the start of the PROC).

• You can have up to 10 titles (TITLE1 – TITLE10) and 10 footnotes (FOOTNOTE1-FOOTNOTE10).

• To Clear TITLE or FOOTNOTE, use the following:

TITLE=’’;

FOOTNOTE=’’;

• WHERE statement can be used in PROC PRINT (not shown in this example.



Example: Using SUMBY statement., the same example by changing ORDERBY to SUMBY.

proc print data=balance_sale2 label split='*' noobs; * SPLIT and NOOBS;

var type order sale bal; * Order of the variables also controlled here;

label bal='Month End*Balance'; * Over write the default split, space;

by type order;

sumby order;

Title 'List of Sale and Balance'; * Can be outside and before the PROC;

Title2 'By Type and Order';

Footnote 'As of March 22, 2003';

run;

The CHART Procedure

The CHART create vertical and horizontal bar charts ( or called histograms), block charts, pie charts, and star charts.

The CHART procedure produces character based low quality chart. It is good for quick analysis, but not good for quality prints and presentation.

Specifications

PROC CHART ;

BY ;

VBAR variable-list ;

HBAR variable-list ;

BLOCK variable-list ;

PIE variable-list ;

STAR variable-list ;

Example: VBAR

proc gchart data=cdat.balance_sale ;

vbar type;

run;

Example: HBAR.

proc gchart data=cdat.balance_sale ;

hbar type;

run;

Note: HBAR gives both bar chart and frequency table. The following is the output from the above HBAR plot:

List of Sale and Balance 68

By Type and Order

21:02 Thursday, March 20, 2003

type Cum. Cum.

Freq Freq Percent Percent



A ‚**************** 120 120 29.63 29.63



B ‚*********** 80 200 19.75 49.38



C ‚*************************** 205 405 50.62 100.00



Šƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒ

30 60 90 120 150 180

Frequency

Example: TYPE= option

proc gchart data=cdat.balance_sale ;

vbar type / type=percent; * Percent;

run;

proc gchart data=cdat.balance_sale ;

vbar type / type=freq; * Frequency;

run;

proc gchart data=cdat.balance_sale ;

vbar type / type=cfreq; * Cumulative frequency;

run;

Example: TYPE= and SUMVAR= options

proc gchart data=cdat.balance_sale ;

vbar type / type=mean sumvar= bal; * SUM bal by TYPE vs. TYPE plot;

run;

proc gchart data=cdat.stocks2;

where price ne .;

vbar industry / type=mean sumvar=price; * SUM price by INDUSTRY vs. INDUSTRY plot;

run;

Example: SUBGROUP= option

proc gchart data=cdat.balance_sale ;

vbar type / type=mean sumvar= bal subgroup=order;

run;

Example: BLOCK chart

proc gchart data=cdat.balance_sale ;

block type / group=order sumvar=bal type=mean;

run;

The PLOT Procedure

The PLOT procedure creates x-y plot for one variable against another. It is also character based quick plot. Good for quick analysis, but not good for quality printing and presentation.

Specifications

PROC PLOT ;

BY variable-list;

PLOT request-list ;

Example: Simple x-y plot

data exp;

do x=1 to 100;

y=x**2;

output;

end;

run;

proc gplot data=exp;

plot y*x;

run;

data sin;

do x=0 to 39;

y=sin(x/6.28);

output;

end;

run;

proc gplot data=sin;

plot y*x;

run;

data show_plot;

set cdat.balance_sale;

forcast=2*bal+200*ranuni(1);

run;

proc gplot data=show_plot;

plot forcast*bal;

run;

Example: With specified symbol and axis scale

proc gplot data=show_plot;

plot forcast*bal='*' / haxis=0 to 1200 by 200

vaxis=0 to 2500 by 500;

run;

Example: Multi y vs. single x with overlay option

data show_plot;

set cdat.balance_sale;

forcast=2*bal+200*ranuni(1);

forcast2=3*bal+100*ranuni(10)+400;

run;

proc gplot data=show_plot;

plot forcast*bal='*' forcast2*bal='o' / overlay box;

run;

Example: The above program can also be written as the following

proc gplot data=show_plot;

plot (forcast forcast2)*bal='*' / overlay box;

run;

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download