EPS 725: Multivariate Statistics



EPS 725: Multivariate Statistics

Principal Components Analysis (PCA)

Second Example

Brief Description of Study

An important first purpose of a previous study by Bridgmon (2007) was to develop a psychometric instrument to measure stress factors experienced by a national sample of doctoral students in counselor education, counseling psychology, and clinical psychology programs who completed all of the degree requirements except for defending the dissertation. She used principal components analysis and confirmatory factor analysis in the development of a 53 item instrument called the Bridgmon All But Dissertation Stress Survey (BASS). A following four component model was discovered.

Component 1 (20 items) Chair and Committee Functioning that measures stress associated with the work, involvement, and interaction of the chair and the committee members.

Component 2 (17 items) Student Organization and Task Commitment which is stress relating to the student’s ability to organize the dissertation process, considering time and materials, as well as, the student’s dedication to complete each dissertation step.

Component 3 (7 items) Statistics and Research Methodology Competence which reflects a students’ perceived stress toward knowing and using statistics and research methods and having adequate training in these areas.

Component 4 (9 items) Relationship and Financial Functioning which measures the stress associated with finances, difficult relationships, and loss of family members and friends.

High scores on all of the components represent more perceived dissertation-related stress.

In this study, Bridgmon wanted to conduct a cross-validation study with a new national sample of doctoral students to see if the four component structure would be replicated. Several analyses were conducted but the focus of our analysis is a PCA. The sample consisted of 194 doctoral students from universities throughout the nation.

Sample Size

Stevens (1996) provides the range of 2:1 through 20:1 ratios for cases per variable as guidelines but states that he believes that 5:1 should be the minimum. Interestingly, Stevens reports the results of a Monte Carlo study by Guadagnoli & Velicer (1988) who made the recommendation that “components with four or more loadings above .60 in absolute value are reliable, regardless of sample size” (p.372). In this study the sample size of 194 is lower than a 5:1 ratio (5 x 53 variables = 265), however, in the first study each component had more than four loadings above .60. As such, it is determined that the sample size is appropriate for the analysis.

Data Screening

We will conduct a few data screening analyses for this problem but are not going to complete a comprehensive screen.

Missing Data

Please check for missing data.

Identify the variables that have missing data._____________________________

________________________________________________________________

Multivariate Outliers

Please check for multivariate outliers.

Identify Χ2.999 = ___________

Identify the case number and Mahalanobis distance value for each case that has multivariate outliers.

________________________________________________________________

________________________________________________________________

Please delete all cases that are multivariate outliers. Of course, a careful examination of the MV outliers should be made before making a decision about deletion.

Multicollinearity

It is comforting that multicollinearity may not be present when the tolerance values are above .50 and variance inflation factors (VIF) are below 10. However, serious concerns about multicollinearity are more evident when tolerance values are less than .10 (Mertler & Vannatta, 2005; & Meyers, Gamst, & Guarino, 2006) or .10 -.17 (Keith, 2006). Also, it may be valuable to look at VIFs that are 6 or 7 and compare them with other measures of multicollinearity to see if there are problems. Keith (2006) states, “Values for the VIF of 6 or 7 may be more reasonable as flags for excessive multicollinearity. These values of the VIF correspond to tolerances of .10 (for a VIF of 10), .14 (VIF of 7), and .17 (VIF of 6), respectively.

Please check for multicollinearity.

There are several tolerance values below .50. Please identify all variables that have tolerances that are below .17.

________________________________________________________________

Please identify all variables that have VIF values above 6.0.

________________________________________________________________

Please identify variables that have condition indexes above 30 coupled with two variance proportions greater than .50.

________________________________________________________________

Principal Components Analysis

Please conduct a principal components analysis using principal components as the extraction method extracting 4 factors. Also, use varimax as the rotation. Choose Analyze>Data Reduction>Factor. After clicking over variables, remember to select initial solution, coefficients, KMO and Bartlett test of sphericity and Continue after clicking on the Descriptives button. After clicking on the Extraction button leave principal components as Method and other defaults except check Scree plot and click on Number of factors and type in 4 and Continue. After clicking on the Rotation button, click on varimax and Continue. After clicking on Options button, click on Sorted by size and Suppress absolute values less than and then type in .32 and Continue and OK.

Factorability

Interpret the bivariate correlations.

________________________________________________________________

________________________________________________________________

Interpret the KMO

________________________________________________________________

________________________________________________________________

Interpret Bartlett’s test of sphericity

________________________________________________________________

________________________________________________________________

Extraction and Rotation

Interpret the extraction communalities.

________________________________________________________________

________________________________________________________________

________________________________________________________________

Interpret the Rotated Sums of Squared Loadings from the Total Variance Explained Table

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

Interpret the Scree Plot

________________________________________________________________

________________________________________________________________

________________________________________________________________

Interpret the Rotated Component Matrix

If more than one variable load on a component, interpret the item associated with the component that has the highest loading.

Component 1: # of loadings >.32 _______

Do the variables that loaded match the description of the original study component 1?

________________________________________________________________

________________________________________________________________

Component 2: # of loadings >.32 _______

Do the variables that loaded match the description of the original study component 2?

________________________________________________________________

________________________________________________________________

Component 3: # of loadings >.32 _______

Do the variables that loaded match the description of the original study component 3?

________________________________________________________________

________________________________________________________________

Component 4: # of loadings >.32 _______

Do the variables that loaded match the description of the original study component 4?

________________________________________________________________

________________________________________________________________

Is an orthogonal rotation (components uncorrelated) or an oblique rotation (components correlated) best for this study?

First, from the Rotated Components Matrix, write down the descriptions and loadings of the variables that loaded on more than one factor.

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

Next, conduct another PCA just like before so you can leave everything the same. Except, this time click on the Rotation button and choose Direct Oblimin instead of varimax and click Continue and OK. At the end of the output will be a Component Correlation Matrix.

Please write down the matrix correlations below.

Conclusion:_______________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

References

Keith, T. Z. (2006). Multiple regression and beyond. Boston, MA: Allyn and Bacon.

Meyers, L. S., Gamst, G., & Guarino, A. J. (2006). Applied multivariate research: Design and interpretation. Thousand Oaks, CA: Sage Publications.

Mertler, C. A., & Vannatta, R. A. (2005). Advanced and multivariate statistical methods. Glendale, CA: Pyrczak Publishing.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download