SAS Regression Using Dummy Variables
SAS Regression Using Dummy Variables
/***********************************************
SAS EXAMPLE -- DUMMY VARIABLES IN REGRESSION
FOR BOTH ORDINAL AND NOMINAL
VARIABLES
FILENAME: regress2.sas
************************************************/
/*USE PERMANENT SAS DATA SET CREATED EARLIER*/
libname LABDATA "c:\temp\labdata";
/*CHECK DUMMY VARIABLE CODING*/
proc freq data=labdata.werner;
tables agegrp agedum1 agedum2 agedum3 agedum4;
title "CHECKING DUMMY VARIABLE CODING";
run;;
proc means data=labdata.werner;
class agegrp;
var age;
run;
/*Create boxplots of cholesterol for each level of agegrp*/
proc sort data=labdata.werner;
by agegrp;
run;
proc boxplot data=labdata.werner;
plot chol*agegrp / boxstyle=schematic;
title "BOXPLOT TO SHOW RELATIONSHIP BETWEEN AGEGRP AND CHOLESTEROL";
run;
/*MODEL WITH AGE DUMMY VARIABLES*/
proc reg data=labdata.werner;
model chol = agedum2 agedum3 agedum4;
AGEDUM: test agedum2, agedum3, agedum4;
output out=regdat1 p=predict1 r=resid1;
plot residual.*predicted.;
title "MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE";
title2 "PLUS A TEST FOR AGE DUMMY VARIABLES";
title3 "REFERENCE AGE IS AGEGRP 1";
run; quit;
proc univariate data=regdat1 plot normal;
var resid1;
histogram;
qqplot / normal(mu=est sigma=est);
run;
/*SWITCH REFERENCE GROUP FOR AGE DUMMY VARIABLES*/
proc reg data=labdata.werner;
model chol = agedum1 agedum2 agedum3;
AGEDUM: test agedum1, agedum2, agedum3;
title "MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE";
title2 "PLUS A TEST FOR AGE DUMMY VARIABLES";
title3 "REFERENCE AGE IS AGEGRP 4";
run; quit;
/*INCLUDE AGE DUMMY VARIABLES AND CONTINUOUS COVARIATES*/
proc reg data=labdata.werner;
model chol = agedum2 agedum3 agedum4 calc uric alb wt;
AGEDUM: test agedum2, agedum3, agedum4;
title "MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE";
title2 "PLUS OTHER CONTINUOUS COVARIATES";
title3 "REFERENCE AGE IS AGEGRP 1";
run;quit;
/*****************************************************
ANOTHER EXAMPLE USING DUMMY VARIABLES FOR A NOMINAL
CATEGORICAL VARIABLE: SPECIES
******************************************************/
title;
data kanga;
infile "c:\temp\labdata\kanga.dat" lrecl=80;
input sex 1
species 3
basal_l 5-8
occip_l 10-13
palat_l 15-18
palat_w 20-22
nasal_l 24-26
nasal_w 28-30
squam_d 32-34
lacry_w 36-38
zygom_w 40-43
orbit_w 45-47
rostr_w 49-51
occip_d 53-55
crest_w 57-59
foram_w 61-63
mandi_l 65-68
mandi_w 70-72
mandi_d 74-76
ramus_h 78-80;
if species = 0 then species_dum0 = 1;
if species in (1,2) then species_dum0=0;
if species = 1 then species_dum1 = 1;
if species in (0,2) then species_dum1=0;
if species = 2 then species_dum2 = 1;
if species in (0,1) then species_dum2=0;
run;
proc means data=kanga;
class species;
run;
proc sort data=kanga;
by species;
run;
proc boxplot data=kanga;
plot crest_w*species / boxstyle=schematic;
run;
proc reg data=kanga;
model crest_w = species_dum1 species_dum2;
plot residual.*predicted.;
output out=kanga_reg1 p=predict r=resid rstudent=rstudent;
run;quit;
proc univariate data=kanga_reg1 plot normal;
var resid;
histogram;
qqplot / normal(mu=est sigma=est);
run;
*************************************************************************************;
proc freq data=labdata.werner;
tables agegrp agedum1 agedum2 agedum3 agedum4;
title "CHECKING DUMMY VARIABLE CODING";
run;;
CHECKING DUMMY VARIABLE CODING
The FREQ Procedure
Cumulative Cumulative
AGEGRP Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 44 23.40 44 23.40
2 46 24.47 90 47.87
3 50 26.60 140 74.47
4 48 25.53 188 100.00
Cumulative Cumulative
AGEDUM1 Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 144 76.60 144 76.60
1 44 23.40 188 100.00
Cumulative Cumulative
AGEDUM2 Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 142 75.53 142 75.53
1 46 24.47 188 100.00
Cumulative Cumulative
AGEDUM3 Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 138 73.40 138 73.40
1 50 26.60 188 100.00
Cumulative Cumulative
AGEDUM4 Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 140 74.47 140 74.47
1 48 25.53 188 100.00
proc means data=labdata.werner;
class agegrp;
var age;
run;
The MEANS Procedure
Analysis Variable : AGE
N
AGEGRP Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 44 44 21.8181818 1.2440131 19.0000000 24.0000000
2 46 46 28.0434783 2.0758561 25.0000000 31.0000000
3 50 50 36.2400000 3.2486104 32.0000000 41.0000000
4 48 48 47.8333333 4.0070859 42.0000000 55.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
/*Create boxplots of cholesterol for each level of agegrp*/
proc sort data=labdata.werner;
by agegrp;
run;
proc boxplot data=labdata.werner;
plot chol*agegrp / boxstyle=schematic;
title "BOXPLOT TO SHOW RELATIONSHIP BETWEEN AGEGRP AND CHOLESTEROL";
run;
[pic]
proc reg data=labdata.werner;
model chol = agedum2 agedum3 agedum4;
AGEDUM: test agedum2, agedum3, agedum4;
output out=regdat1 p=predict1 r=resid1;
plot residual.*predicted.;
title "MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE";
title2 "PLUS A TEST FOR AGE DUMMY VARIABLES";
title3 "REFERENCE AGE IS AGEGRP 1";
run; quit;
MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE
PLUS A TEST FOR AGE DUMMY VARIABLES
REFERENCE AGE IS AGEGRP 1
The REG Procedure
Model: MODEL1
Dependent Variable: CHOL
Number of Observations Read 188
Number of Observations Used 187
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 38114 12705 7.02 0.0002
Error 183 331383 1810.83492
Corrected Total 186 369497
Root MSE 42.55391 R-Square 0.1032
Dependent Mean 235.15508 Adj R-Sq 0.0884
Coeff Var 18.09610
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 218.44186 6.48941 33.66 D >0.1500
Cramer-von Mises W-Sq 0.057998 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.511199 Pr > A-Sq 0.2024
Variable: resid1 (Residual)
Stem Leaf # Boxplot
12 3 1 0
10 3 1 |
8 1480 4 |
6 233488902889 12 |
4 00123355889334 14 |
2 22333334446788992344567789 26 +-----+
0 122233333467778888899011123688999 33 | + |
-0 888887554432100998777643322111000 33 *-----*
-2 9988776322111009977777666663221 31 +-----+
-4 97666533221087775321100 23 |
-6 6763210 7 |
-8 7 1 |
-10
-12
-14
-16 7 1 0
----+----+----+----+----+----+---
Multiply Stem.Leaf by 10**+1
Normal Probability Plot
130+ *
| *+
| ****+
70+ ******
| ****+
| ******
10+ ******
| *****
| *******
-50+ *******
| ******++
| *+++++
-110+++
|
|
-170+*
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
[pic]
[pic]
proc reg data=labdata.werner;
model chol = agedum1 agedum2 agedum3;
AGEDUM: test agedum1, agedum2, agedum3;
title "MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE";
title2 "PLUS A TEST FOR AGE DUMMY VARIABLES";
title3 "REFERENCE AGE IS AGEGRP 4";
run;quit;
MULTIPLE REGRESSION WITH DUMMY VARIABLES FOR AGE
PLUS A TEST FOR AGE DUMMY VARIABLES
REFERENCE AGE IS AGEGRP 4
The REG Procedure
Model: MODEL1
Dependent Variable: CHOL
Number of Observations Read 188
Number of Observations Used 187
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 38114 12705 7.02 0.0002
Error 183 331383 1810.83492
Corrected Total 186 369497
Root MSE 42.55391 R-Square 0.1032
Dependent Mean 235.15508 Adj R-Sq 0.0884
Coeff Var 18.09610
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 257.16667 6.14213 41.87 F
Model 7 74986 10712 6.47 |t|
Intercept 1 -26.00067 67.74900 -0.38 0.7016
AGEDUM2 1 13.64563 8.81812 1.55 0.1236
AGEDUM3 1 20.93826 8.77256 2.39 0.0181
AGEDUM4 1 35.60898 9.03520 3.94 0.0001
CALC 1 23.13575 7.49192 3.09 0.0023
URIC 1 8.62277 2.90619 2.97 0.0034
ALB 1 -4.18820 10.01485 -0.42 0.6763
WT 1 -0.08492 0.16699 -0.51 0.6117
Test AGEDUM Results for Dependent Variable CHOL
Mean
Source DF Square F Value Pr > F
Numerator 3 8892.61133 5.37 0.0015
Denominator 173 1655.47937
Kangaroo data set analysis:
proc means data=kanga;
class species;
run;
The MEANS Procedure
N
species Obs Variable N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 51 sex 51 1.4901961 0.5048782 1.0000000 2.0000000
basal_l 51 1499.39 166.8074433 1090.00 1899.00
occip_l 51 1591.76 161.4247302 1195.00 1925.00
palat_l 51 1035.51 125.6861763 740.0000000 1327.00
palat_w 42 264.6190476 30.2323945 208.0000000 332.0000000
nasal_l 51 706.8627451 89.5468636 493.0000000 905.0000000
nasal_w 51 246.7058824 30.0188568 175.0000000 310.0000000
squam_d 50 179.8600000 25.8575454 122.0000000 265.0000000
lacry_w 51 444.5686275 43.3406299 350.0000000 560.0000000
zygom_w 50 868.9800000 75.0516802 673.0000000 1067.00
orbit_w 50 238.7800000 14.8286678 205.0000000 277.0000000
rostr_w 50 269.9600000 36.1024393 185.0000000 350.0000000
occip_d 50 658.5600000 72.2637347 462.0000000 798.0000000
crest_w 51 108.8627451 39.7839262 21.0000000 203.0000000
foram_w 51 98.7843137 15.6809614 67.0000000 137.0000000
mandi_l 49 1265.16 152.4004247 901.0000000 1648.00
mandi_w 51 135.8039216 13.3611670 101.0000000 174.0000000
mandi_d 51 195.0000000 24.4344020 138.0000000 257.0000000
ramus_h 51 686.6274510 76.1227852 476.0000000 880.0000000
species_dum0 51 1.0000000 0 1.0000000 1.0000000
species_dum1 51 0 0 0 0
species_dum2 51 0 0 0 0
1 48 sex 48 1.5208333 0.5048523 1.0000000 2.0000000
basal_l 47 1476.04 171.0401628 1030.00 1893.00
occip_l 46 1563.24 157.2762869 1121.00 1945.00
palat_l 48 1003.88 129.4495096 665.0000000 1315.00
palat_w 35 245.4285714 33.9179683 172.0000000 319.0000000
nasal_l 47 671.3404255 84.2396366 454.0000000 893.0000000
nasal_w 48 229.2916667 29.4545870 141.0000000 292.0000000
squam_d 48 171.8958333 27.6561252 121.0000000 299.0000000
lacry_w 48 438.4791667 45.3557989 303.0000000 547.0000000
zygom_w 48 850.8333333 71.1517208 640.0000000 994.0000000
orbit_w 48 242.0625000 17.2348068 202.0000000 283.0000000
rostr_w 45 270.4888889 37.1064943 193.0000000 368.0000000
occip_d 41 630.6585366 68.1617964 435.0000000 754.0000000
crest_w 48 115.6250000 42.8344786 13.0000000 216.0000000
foram_w 48 95.1875000 13.8466622 60.0000000 126.0000000
mandi_l 46 1227.50 146.4658853 856.0000000 1568.00
mandi_w 48 134.5208333 12.8708637 101.0000000 163.0000000
mandi_d 48 189.5000000 21.6460895 132.0000000 240.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
species Obs Variable N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 48 ramus_h 48 683.8333333 74.3051023 473.0000000 824.0000000
species_dum0 48 0 0 0 0
species_dum1 48 1.0000000 0 1.0000000 1.0000000
species_dum2 48 0 0 0 0
2 52 sex 52 1.5000000 0.5048782 1.0000000 2.0000000
basal_l 52 1501.08 180.1735379 1048.00 1848.00
occip_l 50 1525.58 144.9033584 1145.00 1823.00
palat_l 51 1027.45 133.0591318 693.0000000 1276.00
palat_w 49 257.9795918 32.7461765 182.0000000 328.0000000
nasal_l 51 617.6862745 75.4576677 434.0000000 751.0000000
nasal_w 51 224.7254902 27.5151438 167.0000000 287.0000000
squam_d 52 189.7115385 29.8633072 131.0000000 280.0000000
lacry_w 51 440.6666667 46.3170235 311.0000000 535.0000000
zygom_w 52 912.3846154 78.8762132 725.0000000 1090.00
orbit_w 52 237.6923077 18.5139465 190.0000000 290.0000000
rostr_w 50 274.8000000 36.9986210 173.0000000 371.0000000
occip_d 47 663.0851064 61.1046251 481.0000000 770.0000000
crest_w 50 144.4000000 34.2225609 60.0000000 214.0000000
foram_w 52 89.7115385 14.7679257 48.0000000 128.0000000
mandi_l 44 1258.82 156.6364882 880.0000000 1583.00
mandi_w 50 147.1800000 12.5903915 108.0000000 169.0000000
mandi_d 51 204.3137255 21.1740314 152.0000000 271.0000000
ramus_h 52 728.7500000 80.9923441 511.0000000 880.0000000
species_dum0 52 0 0 0 0
species_dum1 52 0 0 0 0
species_dum2 52 2.0000000 0 2.0000000 2.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
proc sort data=kanga;
by species;
run;
proc boxplot data=kanga;
plot crest_w*species / boxstyle=schematic;
run;
[pic]
proc reg data=kanga;
model crest_w = species_dum1 species_dum2;
plot residual.*predicted.;
output out=kanga_reg1 p=predict r=resid rstudent=rstudent;
run;quit;
The REG Procedure
Model: MODEL1
Dependent Variable: crest_w
Number of Observations Read 151
Number of Observations Used 149
Number of Observations with Missing Values 2
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 35702 17851 11.70 |t|
Intercept 1 108.86275 5.46963 19.90 0.1500
Cramer-von Mises W-Sq 0.089723 Pr > W-Sq 0.1557
Anderson-Darling A-Sq 0.54369 Pr > A-Sq 0.1660
Variable: resid (Residual)
Stem Leaf # Boxplot
10 0 1 |
9 34 2 |
8 4 1 |
7 09 2 |
6 0 1 |
5 44556 5 |
4 00123445 8 |
3 224577889 9 |
2 0022233445666888 16 +-----+
1 01112233333466668999 20 | |
0 1345567778999 13 *--+--*
-0 998766665554443210 18 | |
-1 9855420 7 | |
-2 99988877766655322 17 +-----+
-3 98543300 8 |
-4 64310 5 |
-5 421 3 |
-6 866430 6 |
-7 80 2 |
-8 841 3 |
-9 8 1 |
-10 3 1 |
----+----+----+----+
Multiply Stem.Leaf by 10**+1
Variable: resid (Residual)
Normal Probability Plot
105+ *
| * *++
| *+++
| *+
| +**
| +***
| ****
35+ +***
| ****
| ****
| ****
| ****
| **+
| *****
-35+ ***+
| **
| +**
| +****
| ++*
| ++***
|++*
-105+*
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
[pic]
[pic]
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- sas regression using dummy variables
- logistic regression using sas university of michigan
- sas commands for logistic regression
- logistic regression
- differences between statistical software sas spss and
- building the regression model i selection of the
- logistic regression using sas
- practice problem for the final exam 3
- 20 stanford university
Related searches
- dummy pdf
- dummy pdf file
- using variables in excel formulas
- using sas for data analysis
- articles using multiple regression analysis
- reorder variables in sas dataset
- sas reorder variables in output
- sas order variables in dataset
- excel using variables in formulas
- linear regression using least square method
- sas rename multiple variables at once
- using environment variables in powershell