Dr. V. Alhanaqtah. Econometrics



Laboratory assignment 7. AUTOCORRELATIONStep 1. Install and activate necessary packages. Copy and paste in R–Studio the following commands:install.packages("lubridate") install.packages("sandwich") install.packages("lmtest") install.packages("car") install.packages("zoo") install.packages("xts") install.packages("dplyr") install.packages("broom") install.packages("ggplot2") install.packages("quantmod") install.packages("rusquant")install.packages("sophisthse") install.packages("Quandl") library("lubridate") library("sandwich") library("lmtest") library("car") library("zoo") library("xts") library("dplyr") library("broom") library("ggplot2") library("quantmod") library("rusquant")library("sophisthse") library("Quandl")Select all and press: CTRL+ENTER. It takes a little time for installation and activation…Now you are ready to work with data.Operations with datesStep 2. Firstly, we’ll work with dates. Let’s create a vector of dates from August 17,2011 to April 15, 2012:x<-c("2012-04-15","2011-08-17")The following command specifies the order of year, month and day:y<-ymd(x)yAs a result we see the following:[1] "2012-04-15 UTC" "2011-08-17 UTC"Step 3. We can operate with dates similarly as we operate with numbers. Let’s add 20 days to our textual date:y+days(20)As a result we get the following:[1] "2012-05-05 UTC" "2011-09-06 UTC"Let’s look at dates 10 years ago:y-days(10)As a result we get the following:[1] "2012-04-05 UTC" "2011-08-07 UTC"Let’s extract only months and then extract only years:month(y)year(y)Creation of time seriesStep 4. Now we’ll create our first time series: here is a data set in which one observation corresponds to one day. Note, x is a variable which contains data, y gives dates of corresponding observations. We have 5 observations (5 random numbers). Starting date is January 1, 2014.x<-rnorm(5)xy<-ymd("2014-01-01")+days(0:4)yAs a result we get the following:[1] "2014-01-01 UTC" "2014-01-02 UTC" "2014-01-03 UTC" "2014-01-04 UTC"[5] "2014-01-05 UTC"The following command creates a time series, where each value of x corresponds to its date. Observations are ordered by the variable y:ts<-zoo(x,order.by=y)tsAs a result we get the following:2014-01-01 2014-01-02 2014-01-03 2014-01-04 2014-01-05 0.1117955 0.4839121 0.1279367 -0.5437245 1.1763361 Step 5. Look at the previous (lagged) dates. For example, the lag is 1 day forward:lag(ts,1)Look at the difference between data:diff(ts)We usually interested in difference between data when we study currency exchange rates (how it changes).Step 6. Let’s create two new time series. We’ll work with the same 5 numbers but now as quarterly and then monthly data. Here zooreg is a regular time series, start indicates a starting date, yearqtr indicates quarterly data (where 01 is the first quarter), freq indicates frequency of data equal to 4 observation in a year:ts2<-zooreg(x,start=as.yearqtr("2014-01"),freq=4)ts2In the following time series yearmon indicates monthly data (where 01 is the first month), frequency is equal 12 observation in a year:ts3<-zooreg(x,start=as.yearmon("2014-01"),freq=12)ts3Basic operations with time seriesStep 7. Let’s consider a time series built-in R: “US Investment Data”. This is an annual time series from 1963 to 1982 with 7 variables. Look at the description of a data set and download data:help(Investment)data(Investment)Look where the data starts and ends:start(Investment)end(Investment)Or you may use the following command to see the start, end, frequency of data, as well as all time indexes:time(Investment)You may see just values of 7 variables without time indexes:coredata(Investment)We can see that some observations are omitted (NA).Step 8. Now we’ll learn how to work with omitted values. To begin with, we create new artificial data set:dna<-InvestmentNow we make some omissions artificially. We omit 2 values: the first is in 1st row/2nd column and the second is in 5th row/3rd column:dna[1,2]<-NAdna[5,3]<-NANow we rebuild omitted values. The first approach is linear approximation (interpolation). A Computer takes two values along the edges of omitted values and calculates the average:na.approx(dna)The second approach implies coping of the previous value:na.locf(dna)Downloading data from external sourcesStep 9. Consider open sources of financial data from Internet. There are some financial databases:finam.rusofisthse.ru Let’s have a look at prices of Apple shares from Note, we need to do some settings in order to read data in English, if your Windows is non-English:Sys.setlocale("LC_TIME","C")Now download data on prices of shares of an Apple corporation:getSymbols(Symbols = "AAPL",from="2010-01-01",to="2014-02-03",src="google")Google Finance stopped providing data in March, 2018.You could try setting src = "yahoo" instead.getSymbols(Symbols = "AAPL",from="2010-01-01",to="2014-02-03",src="yahoo")Glimpse at the beginning and the end of a data set:head(AAPL)tail(AAPL)Step 10. Build a default graph, where we can see price at opening/closing of a stock exchange, minimum/maximum prices all in one graph:plot(AAPL)Or you can split the information into 4 graphs:autoplot(AAPL)The following graph is better for technical analysis:autoplot(AAPL[,1:4], facets=NULL)Advanced graph where you can also see trading volumes (at the bottom):chartSeries(AAPL)Robust confidence intervalsNow we move on to the problem of autocorrelation that is very common for time series.Step 11. We go back to “US Investment Data”. This is an annual time series from 1963 to 1982 with 7 variables. Create a data set in R:d<-InvestmentTurn it into disordered time series:d<-as.zoo(Investment)Plot in one graph Investment and GNP indicators (1st and 2nd columns of a data set):autoplot(d[,1:2])Step 12. Estimate a model using Ordinary Least Squares. Here real investments depend on real interest rate and real GNP. It is logical to expect autocorrelation in the model:model<-lm(data=d,RealInv~RealInt+RealGNP)summary(model)Step 13. Verify a hypothesis on equality of coefficients to zero (i.e., it implies insignificance of coefficients):coeftest(model)It shows that only RealGNP coefficient is significant. Increase in RealGNP by 1 unit leads to increase in RealInv by 0.16 units.Construct a confidence interval:confint(model)Step 14. Is there autocorrelation in the model? We use a graphical method: analyses of residuals. Here we verify how the current values of residuals depend on the previous values. In order to fulfill the analyses we need to augment the data set d by resid (residuals) and fitted (prognostic) values:d_aug<-augment(model,as.data.frame(d))We are interested in residuals. Build a graph of residuals, where lag(.resid) is the previous value (X-axis), .resid is the current value (Y-axis):qplot(data=d_aug,lag(.resid),.resid)There is a relationship between previous and current values of residuals.Step 15. We need to correct autocorrelation. Instead of usual standard errors we have to use robust standard errors. Firstly, let’s have a look at a matrix which is inconsistent in the presence of autocorrelation. Then have a look at a matrix, which is heteroscedasticity and autocorrelation consistent:vcov(model) vcovHAC(model)“Old” matrixHAC matrix (Intercept) RealInt RealGNP(Intercept) 620.7706170 8.47475285 -0.5038304429RealInt 8.4747529 5.61097245 -0.0114567949RealGNP -0.5038304 -0.01145679 0.0004229789 (Intercept) RealInt RealGNP(Intercept) 615.5987887 9.24170225 -0.5679537232RealInt 9.2417023 14.53971300 -0.0222236251RealGNP -0.5679537 -0.02222363 0.0005448671We see that in HAC matrix estimates of coefficients are higher than in the “old”, usual, matrix.Step 16. In the Step 13 testing hypothesis and constructing confidence intervals was incorrect, because we didn’t take into account presence of autocorrelation and used usual matrix. Now we are going to verify hypothesis on significance of coefficients and construct correct confidence intervals, using HAC matrix:coeftest(model,vcov.=vcovHAC(model))We see that coefficient of RealGNP is significant (***), other coefficients are insignificant.In order to construct correct confidence intervals, firstly we need to put the results of calculations of standard errors into a separate table conftable, from which we will need only Estimate and Standard Error. So extract it to the table ci, where 1 is the 1st column of conftable and 2 is the 2nd column of conftable:conftable <- coeftest(model,vcov. = vcovHAC(model))ci <- data.frame(estimate=conftable[,1],se_ac=conftable[,2])Now we have to add left and right borders of 95 % confidence interval as ±1.96?Std.Error:ci <- mutate(ci,left_95=estimate-1.96*se_ac,right_95=estimate+1.96*se_ac)ciAs a result we have got the following confidence intervals: estimate se_ac left_95 right_951 -12.5336006 24.81126334 -61.1636767 36.09647562 -1.0014380 3.81309756 -8.4751092 6.47223323 0.1691365 0.02334239 0.1233854 0.2148875We can see that robust confidence intervals are wider than “old” confidence intervals because of autocorrelation.Formal tests for autocorrelationStep 17. Durbin-Watson test. H0: no autocorrelationHa: autocorrelation of the 1st orderdwt(model)p-value is less then 5 % (0.034<0.05) , so H0 is rejected. There is autocorrelation in the model.Step 18. Breusch-Godfrey test.H0: no autocorrelationHa: autocorrelation of any given orderPresume that the maximum possible order of autocorrelation is 2:bgtest(model,order=2)p-value is higher then 5 % (0.1393>0.05) , so H0 is not rejected. We see that two tests give different results. What does it mean?Note, that a phrase “hypothesis is not rejected” implies that there is not enough data to reject a hypothesis. In other words, that was sufficient data for DW-test to reject a hypothesis. At the same time, that was not sufficient data for BG-test to reject a hypothesis. To sum up, H0 is rejected. It means there is autocorrelation in the model.Graded assignmentFirstly, install and activate all packages from Step 1. Exercise 1. Work with a Solow data set from “Ecdat” package:install.packages("Ecdat")library("Ecdat")help(Solow)h<-SolowEstimate a dependence of output q on capital k and technology A. Which command did you use? ……………….Estimate a usual matrix and a matrix, consistent to heteroscedasticity and autocorrelation. Which commands did you use? ………………………. ………………………………..Exercise 2. Fulfill the Durbin-Watson test. What is the value of DW-statistic? ……………. What is the statistical inference?autocorrelation;no autocorrelation.Exercise 3. Estimate a dependence of output q only on capital k. Which command did you use? …………… Fulfill the Breusch-Godfrey test with the correlation order equal to 3. What is the value of BG-statistic? …….What is the statistical inference?autocorrelation;no autocorrelation.Finish and save your labFile/Save as… ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download