CORRELATION ANALYSIS



CORRELATION ANALYSIS

REVIEW QUESTIONS

(A) M/C: Correlation is best suited for quantifying the relation between an X variable that is ________________ [choices: scale, ordinal, nominal] and a Y variable that is  _______________.

(B) What symbol is used to denote the sample correlation coefficient?

(C) What symbol is used to denote the correlation coefficient in the population?

(D) A t statistic for a correlation coefficient has  ______ degrees of freedom.

(E) What is "bivariate normality"?

(F) What is the range of possible values for r?

(G) Interpret a correlation coefficient of 0.79. (Assume data are linear.)

(H) A correlation coefficient of -0.25 indicates a ____________ ______________ correlation. 

(I) Why not calculate r when data are not linear?

(J) Does the population need to be bivariate normal to describe correlation with r? 

(K) List the null and alternative hypotheses for the p value when testing r.

(L) Does the population need to be bivariate normal to test of r?

(M) Which statements are true?

    (M1) The value of r measures the strength of the relation between X and Y.

    (M2) The value of r measures the strength of the linear relation between X and Y. 

    (M3) The closer r is to +1, the stronger the linear relation between X and Y. 

    (M4) The closer r is to -1 or  +1, the stronger the linear relation between X and Y. 

    (M5) If r is zero, then you can be certain that X and Y are not related [in any way].

    (M6) If r is zero, then X and Y are not related in a linear way.

    (M7) The value of r changes when the units of measure are changed.

    (M8) The value of r does not change when the units of measure are changed.

(N) Why is it important to scrutinize a scatter plot before calculating r?

Anscombe, F. J. (1973). Graphs in Statistical Analysis. The American Statistician, 27, 17-21.): In which data set(s) below will correlation analysis be useful? (Explain your reasoning in each instance.)

(A) Fig. A

(B) Fig. B

(C) Fig. C

(D) Fig. D 

Comment: The numerical value of the correlation coefficient (r) in each instance is 0.82. Clearly, blind use of this statistical this is not recommended!

(14.3)DOLL-ECOL: Ecological analysis of smoking and lung cancer rates. In 1955, Richard Doll published an ecological study on smoking and lung cancer in which per capita cigarette consumption in 1930 (CIG) was compared to lung cancer mortality per 100,000 person-years in 1950 (LUNGCA). Data are:

Country      CIG     LUNGCA

---------  -------   ------

USA          1300      20

GrBrit       1100      46

Finland      1100      35

Switzerland   510      25

Canada        500      15

Holland       490      24

Australia     480      18

Denmark       380      17

Sweden        300      11

Norway        250       9

Iceland       230       6

(A) Construct a scatter plot of the relation between cigarette consumption (X) and lung cancer (Y). Label axes, including units of measure. (See The APA Publication Guide, §3.75 - §3.77 for Figure production.) Briefly discuss this graph. 

(B) Determine the correlation between these factors. To save you time I've calculated intermediates statistics: n = 11,  [pic]= 603.64, [pic]= 20.55; SSXX = 1,432,255, SSYY = 1,375. (You must calculate  SSXY). Report the correlation coefficient to  two decimal places. 

(C) Test H0: r = 0 vs. H1: r [pic]0. Show all steps and calculations. Is the correlation significant at a = .05? at a = .01? List distributional assumptions.

(D) Replicate your analyses in SPSS. After entering (or downloading) the data, construct the scatter plot with Graph > Scatter > Define. To calculate correlation coefficient statistics click Analyze > Correlate > Bivariate. (To compute sums of squares click the Options button on the Bivariate Correlations dialogue box and check the "Cross-product deviations and covariances" box.) If you find a discrepancy between the work done by hand and the work done with SPSS, track down the problem and correct the error.

(E) Calculate the coefficient of determination. Interpret this statistic.

Data (n = 10) on daily SODIUM intake (mg) and systolic blood pressure (BP; mm Hg) from 10 people with hypertension are:

PERSON   SODIUM     BP

    1      6.8      154

    2      7.0      167

    3      6.9      162

    4      7.2      175

    5      7.3      190

    6      7.0      158

    7      7.0      166

    8      7.5      195

    9      7.3      189

   10      7.1      186

(A) Construct a scatter plot of these data (X = SODIUM ; Y = BP). Discuss your results. (Relation linear? Positive, negative, or no trend evident in scatter plot?)

(B) Compute r and r2. Interpret these statistics. 

(C) Test H0: r = 0. Show all hypothesis testing steps and calculations. Is the correlation significant at a = .01?

(14.5)  IGUANA. You want to describe the relation between female iguana body WEIGHT and the number of EGGS she produces (Hampton, 1994, p. 157). Data from 9 gravid iguanas are:

ID   WEIGHT     EGGS

---  ------    ------

1     0.90       33

2     1.55       50

3     1.30       46

4     1.00       33

5     1.55       53

6     1.80       57

7     1.50       44

8     1.05       31

9     1.70       60

(A) Construct a scatter plot (Let X = WEIGHT and Y = EGGS) and then interpret your findings. 

(B) Calculate the correlation coefficient. Interpret this statistic.

(C) Perform a significance test at a = .01. Show all calculations and hypothesis testing steps. Interpret your results. 

A factor thought to influence graduation rates at universities is the scholastic aptitude of entering freshman. To test this idea, a researcher collected data on graduation rates (UPERCENT: percentage of students graduating within 5 years of entry) and the average ACT scores (like the SAT) of incoming freshman at 7 colleges (Berk, 1994, p. 82). Data are:

UPERCENT   ACT

76.2        27

57.6        24

55.4        24

59.7        23

86.0        28

46.2        22

66.7        23

(A) Plot these data and interpret your graph. 

(B) Calculate and interpret the correlation coefficient. 

(C) Test the correlation coefficient for significance at a = .05. Show all work, and discuss your findings. 

(14.7)  FEV.SAV. Download fev.sav . Using SPSS, create a scatter plot of AGE and HEIGHT and then determine the correlation coefficient. Interpret your findings.

(14.8) MAT-MORT.SAV: We want to explore the relation between the percentage of births attended by health care professionals (physicians, nurses, midwives, etc.) and maternal mortality per 100,000 live births. The values for a random sample of 11 countries appears below. Using techniques presented in the chapter, determine whether there is a linear relation between these two variables. (Data are a subset from Pagano & Gauvreau, 2000, p. 407). As part of your analyses construct a scatter plot, calculate r, and determine a p value for H0: r = 0 (two-sided). Interpret your results.

|Nation |% attended |Maternal Mortality per 100,000 |

| | |births |

|Bangladesh |5 |600 |

|Chile |98 |67 |

|Iran |70 |120 |

|Kenya |50 |170 |

|Nepal |6 |830 |

|Netherlands |100 |10 |

|Nigeria |37 |800 |

|Pakistan |35 |500 |

|Panama |96 |60 |

|United States |99 |8 |

|Vietnam |95 |120 |

 

Key to Odd Numbered Problems 

Key to Even Numbered Problems (may not be posted)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download