TYPE=CORR DATA SETS IN SAS - East Carolina University



TYPE=CORR Data Sets in SASThere are several special types of data sets available in SAS. One of these is the TYPE=CORR data set. A TYPE=CORR data set may be created from a standard data set by using PROC CORR. Here is an example using my gradebook from PSYC 6430 (December, 1996):PROC CORR NOMISS DATA=KLW OUTP=GRADES; VAR CLASSWRK MIDTERM FINAL_EX;PROC PRINT;(The PROC PRINT output follows:) OBS _TYPE_ _NAME_ CLASSWRK MIDTERM FINAL_EX 1 MEAN 94.5333 87.4000 89.0000 2 STD 5.4493 7.7993 9.3121 3 N 15.0000 15.0000 15.0000 4 CORR CLASSWRK 1.0000 0.6585 0.7615 5 CORR MIDTERM 0.6585 1.0000 0.5557 6 CORR FINAL_EX 0.7615 0.5557 1.0000The first line in the resulting TYPE=CORR data set has the automatic variable _TYPE_ = “MEAN” and variables classwrk, midterm, and final_ex have values equal to the means of those variables. The second line has _TYPE_ = “STD” and classwrk, midterm, and final_ex equal to the standard deviations of those variables. The third has_TYPE_ = “N” and classwrk, midterm, and final_ex equal to sample sizes. The next three lines contain the correlation matrix. Line 4 has _TYPE_ = “CORR”, _NAME_ = “CLASSWRK”, and classwrk, midterm, and final_ex equal to r11, r12, and r13. Lines 5 and 6 complete the correlation matrix, with appropriate changes in the _NAME_ for each “observation.”By default, SAS assumes that character variables (including “_NAME_”) have not more than eight characters. Suppose that I named “classwork” with nine characters, that is, “classwork” – SAS would read only the first eight characters and later, when I referenced variable “classwork” SAS would (cryptically) complain that there is no such variable – SAS created the variable “classwork,” not “classwork.” To avoid this problem, use names with not more than eight characters or precede your INPUT statement with a LENGTH statement. In the code below, I set the length of _NAME_ to 11, to accommodate the name “NoTrendSmth.”Data climate(type=CORR);Length _NAME_ $11;Input _Type_ $ _NAME_ $ NoTrendSmth NAO_Index M_Min T_precip ;Cards;N . 431 431 431 431Mean . -0.10824 0.00174 32.0 3.44093STD . 0.16032 0.98801 5.2788 1.54558CORR Notrendsmth 1 -0.11832 0.18098 -0.09132CORR NAO_Index -0.11832 1 0.16708 0.01369CORR M_Min 0.18098 0.16708 1 0.11090CORR T_precip -0.09132 0.01369 0.11090 1Many SAS procedures compute the correlation matrix (or something very close to it) as the first step in their data analysis. Often this is the most computationally expensive part of the procedure. If you are working with very large data sets, you can save processing time by inputting the correlation matrix rather than the raw data. For example, I want first to obtain all the bivariate correlations for variables X1 X50 and then I want to do several multiple regressions involving these variables. I first use PROC CORR to get the correlations from the raw data and to output the TYPE=CORR data set. I then use the TYPE=CORR data set as the input data set for the multiple regression analyses. You can save the output correlation file in a SAS system file by giving it a two level name, for example, “outp=duh.sol” -- “duh” would first have to defined as a SAS library -- a SAS library is a pointer to a location where SAS files are stored. A saved SAS data file is brought back in to SAS with the SET command.When using PROC CORR to output a type=corr data set, you will generally need to use the NOMISS option in PROC CORR, which results in the deletion of data from any subject that is missing data on any of the variables. This is highly recommended if you are going to input the correlation matrix to another PROC such as PROC REG. Otherwise the various correlations in the matrix may be based on different subsets of the entire data set, which may lead to biased results or worse. PROC REG will accept the default “pairwise” correlation matrix produced if you do not specify NOMISS, printing a warning in the log about unequal n’s in the matrix (unless the n’s happen to come out equal despite some variables having missing data), but the results may be biased.Creating Your Own Type=Corr Data SetSometimes you may have the correlation matrix but not the raw data. For example, you may find a correlation matrix in an article or a book and want to do some analysis on it. You can type the correlations, N’s, means, and standard deviations into a Type=Corr data set of your own creation and then use it in SAS.If you don’t specify the N’s, SAS will assume 10,000 (at least that was the default the last time I did not specify N -- the default has changed with versions), and if you don’t specify means and standard deviations for the variables, SAS will assume mean 0 and standard deviation 1.Here is a little program that uses the same correlation matrix that was presented in the handout “Using Matrix Algebra to do Multiple Regression.” Copy it into the SAS editor and submit it.options pageno=min nodate formdlim='-';DATA SOL(TYPE=CORR);LENGTH _NAME_ $ 11;*Specify the length of the longest variable name, in this case, misanthropy, 11;INPUT _TYPE_ $ _NAME_ $ idealism relativism misanthropy gender attitude; CARDS;corr idealism 1.0000 -0.0870 -0.1395 -0.1011 0.0501corr relativism -0.0870 1.0000 0.0525 0.0731 0.1581corr misanthropy -0.1395 0.0525 1.0000 0.1504 0.2259corr gender -0.1011 0.0731 0.1504 1.0000 -0.1158corr attitude 0.0501 0.1581 0.2259 -0.1158 1.0000N . 153 153 153 153 153mean . 3.64926 3.35810 2.32157 1.18954 2.37276std . 0.53439 0.57596 0.67560 0.39323 0.52979;*Note the use of a period for missing value for _NAME_ on row of _TYPE_ N, mean, and std.;PROC REG; MODEL attitude = idealism -- gender / STB SCORR2; run;Yes, this can be done with SPSS too. HYPERLINK "" Correlation Matrix Input to SPSSCorrelation Matrix, Output to FileCopyright 2019, Karl L. Wuensch, All Rights Reserved ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download