Working with Time Series Data in R

Working with Financial Time Series Data in R

Eric Zivot

Department of Economics, University of Washington

June 30, 2014

Preliminary and incomplete: Comments welcome

Introduction

In this tutorial, I provide a comprehensive summary of specifying, manipulating, and visualizing various kinds of financial time series data in R. Base R has limited functionality for handling general time series data. Fortunately, there are several R packages - lubridate, quantmod, timeDate, timeSeries, zoo, xts, xtsExtra - with functions for creating, manipulating and visualizing time date and time series objects. I will illustrate how to use the functions in these R packages for handling financial time series.

This tutorial is organized as follows.

1. Overview of time series objects in R 2. Overview of date and date-time objects in R

a. Date class b. POSIXt classes c. Working with dates and times using the lubridate package d. timeDate class 3. The ts and mts classes for representing regularly spaced calendar time series 4. The zoo class for representing general time series 5. The xts class: an extension of zoo 6. The timeSeries class for representing general time series

Overview of Time Series Objects in R

The core data object for holding data in R is the data.frame object. A date.frame is a rectangular data object whose columns can be of different types (e.g., numeric, character, logical, Date, etc.). The data.frame object, however, is not designed to work efficiently with time series data. In particular, sub-setting and merging data based on a time index is cumbersome and transforming and aggregating data based on a time index is not at all straightforward. Furthermore, the default plotting methods in R are not designed for handling time series data. Hence, there is a need for a flexible time series class in R with a rich set of methods for manipulating and plotting time series data.

Base R has limited functionality for handling general time series data. For example, univariate and multivariate regularly spaced calendar time series data can be represented using the ts and mts classes, respectively. These classes have a limited set of method functions for manipulating and plotting time series data. However, these classes cannot adequately represent more general irregularly spaced non-calendar time series such intra-day transactions level financial price and quote data. Fortunately, there are several R packages that can be used to handle general time series data.

The table below lists the main time series objects that are available in R and their respective packages.

Time Series Object fts its irts

Package fts its tseries

timeSeries ti ts, mts zoo xts

timeSeries

tis stats zoo xts

Description An R interfact to tslib (a time series library in C++) An S4 class for handling irregular time series irts objects are irregular time-series objects. These are scalar or vector valued time series indexed by a time-stamp of class "POSIXct". Rmetrics package of time series tools and utilities. Similar to the Tibco S-PLUS timeSeries class Functions and S3 classes for time indexes and time indexed series, which are compatible with FAME frequencies Regularly spaced time series objects S3 class of indexed totally ordered observations which includes irregular time series. Extension of the zoo class

The ts and mts classes in base R are suitable for representing regularly spaced calendar time series such as monthly sales or quarterly real GDP. In addition, several of the time series modeling functions in base R and in several R packages take ts and mts objects as data inputs. For handling more general irregularly spaced financial time series, by far the most used packages are timeSeries, zoo and xts. The timeSeries package is part of the suite of Rmetrics packages for financial data analysis and computational finance created by Diethelm Weurtz and his colleagues at ETZ Zurich (see ). In these packages, timeSeries objects are the core data objects. However, outside of Rmetrics, timeSeries objects are not as frequently used as zoo and xts objects for

representing time series data. Hence, in this tutorial I will focus mostly on using zoo and xts objects for handing general time series. 1

Time series data represented by timeSeries, zoo and xts objects have a similar structure: the time index is stored as a vector in some (typically ordered) date-time object, and the data is stored in some rectangular data object. The resulting timeSeries, zoo or xts objects combine the time index and data into a single object. These objects can then be manipulated and visualized using various method functions.

Before discussing the time series objects in detail, I will give a comprehensive overview of the most useful date and date-time objects available in R. This knowledge is required to fully understand how to effectively work with time series objects in R.

Overview of Date and Date-Time Objects in R

There are several ways to represent a time index (sequence of dates or date-times) in R. Table 1 summarizes the main time index classes available in R.

Class chron

Date yearmon

yearqtr

POSIXct

POSIXlt timeDate

Table 1 Date index classes in R

Package chron

base zoo

zoo

base

Base timeDate

Description Represent calendar dates and times within the day as the (signed) number of seconds since the beginning of 1970 as a numeric vector. Does not control for time zones. Represent calendar dates as the number of days since 1970-01-01 Represent monthly data. Internally it holds the data as year plus 0 for January, 1/12 for February, 2/12 for March and so on in order that its internal representation is the same as ts class with frequency = 12. Represent quarterly data. Internally it holds the data as year plus 0 for Quarter 1, 1/4 for Quarter 2 and so on in order that its internal representation is the same as ts class with frequency = 4. Represent calendar dates and times within the day as the (signed) number of seconds since the beginning of 1970 as a numeric vector. Supports various time zone specifications (e.g. GMT, PST, EST etc.) Represents local dates and times within the day as named list of vectors with date-time components. The Rmetrics timeDate Sv4 class fulfils the conventions of the ISO

1 A somewhat dated but still very useful survey of working with financial time series in R, especially with the functions in the Rmetrics suite of packages, is available in the free ebook "A Discussion of Time Series in R for Finance" by Diethelm W?rtz, Yohan Chalabi and Andrew Ellis. This book can be downloaded from the Rmetrics website .

(Sv4)

8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards Rmetrics has added the "Financial Center" concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. timeDate is almost compatible with the timeDate class in Tibco's S- PLUS.

The base R Date class handles dates (without times), and is the recommended class for representing financial data that are observed on discrete dates without regard to the time of day (e.g., daily closing prices). The base R POSIXct and POSIXlt classes allow for dates and times with control for time zones. This is the recommended class for representing dates associated with financial data observed at particular times within a day (e.g., prices or quotes observed during the trading hours of a day). The chron class is similar but is not used as often as the POSIXt classes.2 The yearmon and yearqtr classes from the zoo package are convenient for representing regularly spaced monthly and quarterly data, respectively, when it is not necessary to specify exactly when during the month or quarter the data is observed. The Rmetrics timeDate class is an Sv4 class very similar to the S-PLUS timeDate class3, is based on the POSIX standards, and is used throughout the Rmetrics suite of packages.

The Date Class (base R)

Use the Date class to represent a time index only involving dates but not times within a day. The Date class by default represents dates internally as the number of days since January 1, 1970. You create Date objects from a character string representing a date using the as.Date() function. The default format is "YYYY/m/d" or "YYYY-m-d"", where YYYY represents the four digit year, m represents the month digit and d represents the day digit. For example,

> my.date = as.Date("1970/1/1") > my.date [1] "1970-01-01" > class(my.date) [1] "Date" > as.numeric(my.date) [1] 0 > myDates = c("2013-12-19", "2003-12-20") > as.Date(myDates) [1] "2013-12-19" "2003-12-20"

Use the format argument to specify the input format of the date if it is not in the default format

> as.Date("1/1/1970", format="%m/%d/%Y") [1] "1970-01-01" > as.Date("January 1, 1970", format="%B %d, %Y") [1] "1970-01-01" > as.Date("01JAN70", format="%d%b%y")

2 Spector (2004) gives an excellent overview of the chron, Date, and POSIXt classes in R. 3 Some might say "ripped off" from.

[1] "1970-01-01"

Notice that the output format is always in the form "YYYY-m-d" regardless of the input format. To change the displayed output format of a date use the format() function > format(my.date, "%b %d, %Y") [1] "Jan 01, 1970"

Some date formats provide insufficient information to be unambiguously represented as a Date object. For example,

> as.Date("Jan 1970", format="%b %Y") [1] NA

Table 2 below gives the standard date format codes.

Code %d %m %b %B %y %Y

Value Day of the month (decimal number) Month (decimal number) Month (abbreviated) Month (full name) Year (2 digit) Year (4 digit)

Table 2. Format codes for dates

Example 23 11 Jan January 90 1990

Recall, dates are internally recorded as the (integer) number of days since 1970-01-01. As a result, you can also create a Date object from integer data. One way to convert an integer variable to a Date object is to use the class() function

> my.date = 0 > class(my.date) = "Date" > my.date [1] "1970-01-01"

Another way is to use the as.Date() function with optional argument origin if the origin date is different than the default 1970-01-01. For example, to determine the date that is 32500 days from 1900-01-01 use

> as.Date(32500, origin=as.Date("1900-01-01")) [1] "1988-12-25"

Extracting Information from Date objects Consider the Date object

> my.date [1] "1970-01-01"

Suppose I want to extract the year component from this object as a character string or as an integer. I can do this using the format() function

> myYear = format(my.date, "%Y") > myYear [1] "1970" > class(myYear) [1] "character" > as.numeric(myYear) [1] 1970 > as.numeric(format(my.date, "%Y")) [1] 1970

By specifying different format codes in the format() function, I can extract other components of the date such as the month or day.

Additionally, the weekdays(), months(), quarters() and julian() functions can be used to extract specific components of Date objects

> weekdays(my.date) [1] "Thursday" > months(my.date) [1] "January" > quarters(my.date) [1] "Q1" > julian(my.date, origin=as.Date("1900-01-01")) [1] 25567 attr(,"origin") [1] "1900-01-01"

Manipulating Date Objects Having a numeric representation for dates allows for some simple date arithmetic. For example,

> my.date [1] "1970-01-01" > my.date + 1 [1] "1970-01-02" > my.date - 1 [1] "1969-12-31" > my.date + 31 [1] "1970-02-01"

Logical comparisons can also be made

> my.date [1] "1970-01-01" > my.date1 = as.Date("1980-01-01") > my.date1 > my.date [1] TRUE

Subtracting two Date objects creates a difftime object and shows the number of days between the two dates

> diff.date = my.date1 - my.date > diff.date

Time difference of 3652 days > class(diff.date) [1] "difftime" > as.numeric(diff.date) [1] 3652 > my.date + diff.date [1] "1980-01-01"

Creating Date Sequences Very often sequences of dates are required in the construction of time series objects. The base R function seq() (with method function seq.Date() for objects of class Date) can create many types of date sequences. The arguments to seq.Date() are

> args(seq.Date) function (from, to, by, length.out = NULL, along.with = NULL,

...)

where from specifies the starting date, to specifies the ending date and by specifies the increment of the sequence. The by increment is a character string, containing one of "day", "week", "month" or "year", and can be preceded by a (positive or negative) integer and a space, or followed by "s". For example, to create a bi-monthly sequence of Date objects starting 1993-03-01 and ending in 2003-03- 01 use

> my.dates = seq(as.Date("1993/3/1"), as.Date("2003/3/1"), "2 months") > head(my.dates) [1] "1993-03-01" "1993-05-01" "1993-07-01" "1993-09-01" "1993-11-01" [6] "1994-01-01" > tail(my.dates) [1] "2002-05-01" "2002-07-01" "2002-09-01" "2002-11-01" "2003-01-01" [6] "2003-03-01"

Alternatively, use

> my.dates = seq(from=as.Date("1993/3/1"), by="2 months", length.out=61)

The seq() function can also be used to determine the date that is a specified number of days, weeks, months or years from a given date. For example, to find the date that is 5 months away from today's date use

> Sys.Date() [1] "2014-01-10" > seq(from=Sys.Date(), by="5 months", length.out=2)[2] [1] "2014-06-10"

While the above is a clever solution, it is not very intuitive. The lubridate package, described later on, provides a much easier solution.

Plotting Date Objects Given a data set of Date objects, it is possible to graphically summarize the distribution of dates using the hist() function (with method function hist.Date()) . For example, the following code simulates 500 random dates between 2013-01-01 and 2014-01-01 and plots a histogram summarizing the number of dates within each month

> rint = round(runif(500)*365)

> startDate = as.Date("2013-01-01")

> myDates = startDate + rint

> head(myDates)

[1] "2013-10-05" "2013-10-23" "2013-11-20" "2013-05-27" "2013-07-11" "2013-

06-07"

> hist(myDates, breaks="months", freq=TRUE,

+

main="Distribution of Dates by Month",

+

col="slateblue1", xlab="",

+

format="%b %Y", las=2)

The resulting histogram is shown in Figure 1.

Figure 1 Histogram of Date Objects

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download