Correlation and Regression - IFT

[Pages:33]R09 Correlation and Regression

IFT Notes

1. Introduction ............................................................................................................................. 2 2. Correlation Analysis ................................................................................................................ 2

2.1. Scatter Plots ......................................................................................................................... 2 2.2. Correlation Analysis ............................................................................................................ 3 2.3. Calculating and Interpreting the Correlation Coefficient .................................................... 4 2.4. Limitations of Correlation Analysis..................................................................................... 6 2.5. Uses of Correlation Analysis ............................................................................................... 6 2.6. Testing the Significance of the Correlation Coefficient ...................................................... 8 3. Linear Regression .................................................................................................................. 11 3.1. Linear Regression with One Independent Variable ........................................................... 11 3.2. Assumptions of the Linear Regression Model................................................................... 13 3.3. The Standard Error of Estimate ......................................................................................... 15 3.4. The Coefficient of Determination ...................................................................................... 17 3.5. Hypothesis Testing............................................................................................................. 19 3.6. Analysis of Variance in a Regression with One Independent Variable ............................. 25 3.7. Prediction Intervals ............................................................................................................ 27 3.8. Limitations of Regression Analysis ................................................................................... 30 4. Summary ................................................................................................................................ 31

This document should be read in conjunction with the corresponding reading in the 2017 Level II CFA? Program curriculum. Some of the graphs, charts, tables, examples, and figures are copyright 2016, CFA Institute. Reproduced and republished with permission from CFA Institute. All rights reserved.

Required disclaimer: CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by IFT. CFA Institute, CFA?, and Chartered Financial Analyst? are trademarks owned by CFA Institute.

Copyright ? IFT. All rights reserved

ift.world

Page 1

R09 Correlation and Regression

IFT Notes

1. Introduction

In this reading, we look at two important concepts to examine the relationship between two or more financial variables: correlation analysis and regression analysis. For example, how to determine if there is a relationship between the returns of the U.S. stock market and the Japanese stock market over the past five years, or between unemployment and inflation?

2. Correlation Analysis

In this section, we look at two methods to examine how two sets of data are related to each other: scatter plots and correlation analysis.

2.1. Scatter Plots A scatter plot is a graph that shows the relationship between the observations for two data series in two dimensions (x-axis and y-axis). The scatter plot below is reproduced from the curriculum:

Figure 1. Scatter Plot of Annual Money Supply Growth Rate and Inflation Rate by Country, 1970?2001

Copyright ? IFT. All rights reserved

ift.world

Page 2

R09 Correlation and Regression

IFT Notes

Interpretation of Figure 1:

The two data series here are the average growth in money supply (on the x-axis) plotted against the average annual inflation rate (on the y-axis) for six countries.

Each point on the graph represents (money growth, inflation rate) pair for one country. From the six points, it is evident that there is an increase in inflation as money supply grows.

2.2. Correlation Analysis Correlation analysis is used to measure the strength of the relationship between two variables. It is represented as a number. The correlation coefficient is a measure of how closely related two data series are. In particular, the correlation coefficient measures the direction and extent of linear association between two variables.

Characteristics of the correlation coefficient A correlation coefficient has no units. The sample correlation coefficient is denoted by r. The value of r is always -1 r 1. A value of r greater than 0 indicates a positive linear association between the two variables. A value of r less than 0 indicates a negative linear association between the two variables. A value of r equal to 0 indicates no linear relation between the two variables.

The three scatter plots below show a positive linear, negative linear, and no linear relation between two variables A and B. They have correlation coefficients of +1, -1 and 0 respectively.

Copyright ? IFT. All rights reserved

ift.world

Page 3

Figure 2. Variables with a Correlation of 1.

R09 Correlation and Regression

IFT Notes

Figure 3. Variables with a Correlation of -1.

Figure 4: Variables with a Correlation of 0.

2.3. Calculating and Interpreting the Correlation Coefficient In order to calculate the correlation coefficient between two variables, X and Y, we need the

Copyright ? IFT. All rights reserved

ift.world

Page 4

R09 Correlation and Regression

following: 1. Covariance between X and Y, denoted by Cov (X,Y) 2. Standard deviation of X, denoted by sx 3. Standard deviation of Y, denoted by sy

This is the formula for computing the sample covariance of X and Y: ( )( )

Cov (X, Y) =

IFT Notes

The table below illustrates how to apply the covariance formula. Our data is the money supply

growth rate (Xi) and the inflation rate (Yi) for six different countries. represents the average money supply growth rate and represents the average inflation rate.

Country

Xi

Yi

Cross-Product

Australia Canada New Zealand Switzerland

United Kingdom

United States Sum Average Covariance Variance Standard deviation

0.1166 0.0915 0.106 0.0575

0.1258

0.0634 0.5608 0.0935

0.0676 0.0519 0.0815 0.0339

0.0758

0.0509 0.3616 0.0603

0.000169 0.000017 0.000265 0.00095 0.000501 0.000283 0.002185

0.000437

Squared Deviations

0.000534 0.000004 0.000156 0.001296 0.001043 0.000906 0.003939

0.000788 0.028071

Squared Deviations

0.000053 0.000071 0.000449 0.000697 0.00024 0.000088 0.001598

0.00032 0.017889

Notes: 1. Divide the cross-product sum by n - 1 (with n = 6) to obtain the covariance of X and Y. 2. Divide the squared deviations sums by n - 1 (with n = 6) to obtain the variances of X and Y. Source: International Monetary Fund.

Given the covariance between X and Y and the two standard deviations, the sample correlation can be easily calculated.

The following equation shows the formula for computing the sample correlation of X and Y:

Copyright ? IFT. All rights reserved

ift.world

Page 5

( )

R09 Correlation and Regression

IFT Notes

r= ( )=

= 0.870236

LO.a: Calculate and interpret a sample covariance and a sample correlation coefficient; and interpret a scatter plot.

2.4. Limitations of Correlation Analysis The correlation analysis has certain limitations:

Two variables can have a strong non-linear relation and still have a very low correlation. Recall that correlation is a measure of the linear relationship between two variables.

The correlation can be unreliable when outliers are present. The correlation may be spurious. Spurious correlation refers to the following situations:

o The correlation between two variables that reflects chance relationships in a particular data set.

o The correlation induced by a calculation that mixes each of two variables with a third variable.

o The correlation between two variables arising not from a direct relation between them, but from their relation to a third variable. Ex: shoe size and vocabulary of school children. The third variable is age here. Older shoe sizes simply imply that they belong to older children who have a better vocabulary.

LO.b: Describe the limitations to correlation analysis.

2.5. Uses of Correlation Analysis The uses of correlation analysis are highlighted through six examples in the curriculum. Instead of reproducing the examples, the specific scenarios where they are used are listed below:

Evaluating economic forecasts: Inflation is often predicted using the change in the consumer price index (CPI). By plotting actual vs predicted inflation, analysts can

Copyright ? IFT. All rights reserved

ift.world

Page 6

R09 Correlation and Regression

IFT Notes

determine the accuracy of their inflation forecasts. Style analysis correlation: Correlation analysis is used in determining the appropriate

benchmark to evaluate a portfolio manager's performance. For example, assume the portfolio managed consists of 200 small value stocks. The Russell 2000 Value Index and the Russell 2000 Growth Index are commonly used as benchmarks to measure the smallcap value and small-cap growth equity segments, respectively. If there is a high correlation between the returns to the two indexes, then it may be difficult to distinguish between small-cap growth and small-cap value as different styles. Exchange rate correlations: Correlation analysis is also used to understand the correlations among many asset returns. This helps in asset allocation, hedging strategy and diversification of the portfolio to reduce risk. Historical correlations are used to set expectations of future correlation. For example, suppose an investor who has an exposure to foreign currencies. He needs to ascertain whether to increase his exposure to the Canadian dollar or to Japanese Yen. By analyzing the historical correlations between USD returns to holding the Canadian dollar and USD returns to holding the Japanese yen, he will be able to come to a conclusion. If they are not correlated, then holding both the assets helps in reducing risk. Correlations among stock return series: Analyzing the correlations among the stock market indexes such as large-cap, small-cap and mid-cap helps in asset allocation and diversifying risk. For instance, if there is a high correlation between the returns to the large-cap index and the small-cap index, then their combined allocation may be reduced to diversify risk. Correlations of debt and equity returns: Similarly, the correlation among different asset classes, such as equity and debt, is used in portfolio diversification and asset allocation. For example, high-yield corporate bonds may have a high correlation to equity returns, whereas long-term government bonds may have a low correlation to equity returns. Correlations among net income, cash flow from operations, and free cash flow to the firm: Correlation analysis shows if an analyst's decision to value a firm based only on NI and ignore CFO and FCFF is correct. FCFF is the cash flow available to debt holders and

Copyright ? IFT. All rights reserved

ift.world

Page 7

R09 Correlation and Regression

IFT Notes

shareholders after all operating expenses have been paid and investments in working and fixed capital have been made. If there is a low correlation between NI and FCFF, then the analyst's decision to use NI instead of FCFF/CFO to value a company is questionable.

2.6. Testing the Significance of the Correlation Coefficient The objective of a significance test is to assess whether there is really a correlation between random variables, or if it is a coincidence. If it can be ascertained that the relationship is not a result of chance, then one variable can be used to predict another variable using the correlation coefficient.

A t-test is used to determine whether the correlation between two variables is significant. The population correlation coefficient is denoted by (rho). As long as the two variables are distributed normally, we can use hypothesis testing to determine whether the null hypothesis should be rejected using the sample correlation, r. The formula for the t-test is:

t=

The test statistic has a t-distribution with n - 2 degrees of freedom.

n denotes the number of observations.

How to use the t-test to determine significance:

1. Write the null hypothesis H0 i.e. ( = 0), and the alternative hypothesis Ha i.e. ( 0). Since the alternative hypothesis is to test the correlation is not equal to zero, it is a twotailed test.

2. Specify the level of significance. Determine the degrees of freedom. 3. Determine the critical value, tc for the given significance level and degrees of freedom.

4. Calculate the test statistic, t =

5. Make a decision to reject the null hypothesis H0, or fail to reject H0. If absolute value of t > tc, then reject H0. If absolute value of t tc, then fail to reject H0.

6. Interpret the decision: a. If you reject H0, then there is a significant linear correlation.

Copyright ? IFT. All rights reserved

ift.world

Page 8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download