Predicting share price by using Multiple Linear Regression

Predicting share price by using Multiple Linear Regression

A Bachelor Thesis in Mathematical Statistics

Gustaf Forslund & David ?kesson Vehicle engineering KTH May 21st, 2013

Abstract

The aim of the project was to design a multiple linear regression model and use it to predict the share's closing price for 44 companies listed on the OMX Stockholm stock exchange's Large Cap list. The model is intended to be used as a day trading guideline i.e. today's information is used to predict tomorrow's closing price. The regression was done in Microsoft Excel 2010[18] by using its built-in function LINEST. The LINEST-function uses the dependent variable y and all the covariates x to calculate the -value belonging to each covariate. Several multiple linear regression models were created and their functionality was tested, but only seven models were better than chance i.e. more than 50 % in the right direction. To determine the most suitable model out of the remaining seven, Akaike's Information Criterion (AIC), was applied. The covariates used in the final model were; Dow Jones closing price, Shanghai opening price, conjuncture, oil price, share's opening price, share's highest price, share's lowest price, lending rate, reports, positive/negative insider trading, payday, positive/negative price target, number of completed transactions during one day, OMX Stockholm closing price, TCW index, increasing closing price three days in a row and decreasing closing price three days in a row.

The maximum average deviation between the predicted closing price and the real closing price of all the 44 shares predicted were 6,60 %. In predicting the correct direction (increase or decrease) of the 44 shares an average of 61,72 % were achieved during the time period 2012-02-22 to 2013-02-20. If investing 50.000 SEK in each company i.e. a total investment of 2.2 million SEK, the total yield when using the regression model during the year 2012-02-22 to 2013-02-20 would have been 259.639 SEK (11,80 %) compared to 184.171 SEK (8,37 %) if the shares were never to be traded with during the same period of time. Of the 44 companies analysed, 31 (70,45 %) of them were profitable when using the regression model during the year compared to 30 (68,18 %) if the shares were never to be sold during the same period of time. The difference in yield in percentage between the model and keeping the shares for the year was 40,98 %.

Table of Contents

Chapter 1 Introduction............................................................................................................................. 1 1.1 Introduction ................................................................................................................................... 1

Chapter 2 Theory..................................................................................................................................... 2 2.1 Econometrics ................................................................................................................................. 2 2.1.1 The Multiple Linear Regression Model theory ...................................................................... 2 2.2 Prediction....................................................................................................................................... 3 2.3 Regression channels ...................................................................................................................... 4

Chapter 3 Data......................................................................................................................................... 6 3.1 Covariates...................................................................................................................................... 6 3.1.1 Stock exchanges in the world ................................................................................................. 6 3.1.2 Conjuncture ............................................................................................................................ 6 3.1.3 TCW index ............................................................................................................................. 6 3.1.4 Lending rate............................................................................................................................ 6 3.1.5 Pay day ................................................................................................................................... 7 3.1.6 Opening price ......................................................................................................................... 7 3.1.7 Highest/lowest price of the day .............................................................................................. 7 3.1.8 Positive/Negative insider trading ........................................................................................... 7 3.1.9 Quarterly and annual reports .................................................................................................. 7 3.1.10 Positive/negative price target ............................................................................................... 7 3.1.11 Oil price ................................................................................................................................ 8 3.1.12 Three positive/three negative days in a row ......................................................................... 8 3.1.13 P/E ratio ................................................................................................................................ 8 3.1.14 Positive/Negative press releases ........................................................................................... 8 3.1.15 Number of completed transactions ....................................................................................... 8 3.1.16 OMX Stockholm closing price ............................................................................................. 8 3.1.17 Split and reversed split ......................................................................................................... 8 3.2 Collecting data............................................................................................................................... 9

Chapter 4 Modelling................................................................................................................................ 9 4.1 Modelling ...................................................................................................................................... 9 4.2 AIC test ......................................................................................................................................... 9 4.2.1 First model............................................................................................................................ 10 4.2.2 Second model: ...................................................................................................................... 11

4.2.3 Third model: ......................................................................................................................... 11 4.2.4 Fourth model: ....................................................................................................................... 12 4.2.5 Fifth model: .......................................................................................................................... 12 4.2.6 Sixth model: ......................................................................................................................... 12 4.2.7 Seventh model: ..................................................................................................................... 13 4.2 The final model ........................................................................................................................... 13 Chapter 5 Result .................................................................................................................................... 15 5.1 Plots of the residual and R2 value ................................................................................................ 15 5.2 Correct predicted directions ........................................................................................................ 16 5.3 Maximum and average deviation ................................................................................................ 17 5.4 Investing ...................................................................................................................................... 18 Chapter 6 Discussion............................................................................................................................. 21 6.1 Discussion ................................................................................................................................... 21 Chapter 7 Appendix............................................................................................................................... 23 7.1 Predicted closing price compared to real closing price ............................................................... 23 8.2 Yield using the model.................................................................................................................. 45 8.3 Stock exchange indexes............................................................................................................... 67 8.4 Conjuncture ................................................................................................................................. 68 8.5 Oil price per barrel ...................................................................................................................... 69 8.6 TCW index .................................................................................................................................. 69 8.7 MATLAB code............................................................................................................................ 70 Chapter 9 References............................................................................................................................. 91 9.1 References ................................................................................................................................... 91

Chapter 1 Introduction

1.1 Introduction

Most people around the world dream of having more money in their pockets. The trick, however, is not to work harder, but to work smarter. One way to make more money is to invest on the stock exchange, but due to its seemingly random and unpredictable nature, people are reluctant to do so. At first glance the stock exchange may seem random and unpredictable, but that is not the entire truth. If the stock exchange is carefully analysed a pattern will slowly emerge and it will be evident that there are a number of variables that contributes to a company's share price. Such variables are positive and negative news, price target, conjuncture, oil price and other economic influential country's stock exchange just to name a few.

One of the most common share analysis tool used today is the so called regression channel. These regression models are often sole based on the closing price vs. time and is more reminiscent of a technical analysis rather than a prediction of the shares closing price. This project aims to take it a step further by predicting a closing price for each day. The approach is to determine which variables that has an influence on company's share price, design a multiple linear regression model and perform prediction using Microsoft Excel 2010's[18] built-in function LINEST to predict the closing price of 44 companies listed on the OMX Stockholm stock exchange's Large Cap list. The Large Cap list was at the time made up of 62 companies, but sufficient information was only found for 44 of them. Unlike the regression channels that can be used for forecasting the direction of shares for several days ahead, even weeks, this model will be used to analyse share prices on a daily basis for what resembles day trading. The goal with the final model is to maximize the profit and minimize the losses based on a daily analysis during the time period 2012-02-22 to 2013-02-20.

Since several multiple linear regression models were to be designed containing different sets of covariates the Akaike Information Criterion (AIC) was used to determine the most suitable model. One of the criterions for the model, set by us, were that it should be better than chance in predicting if the share would increase or decrease in value i.e. have more than 50 % of the predicted values in the correct direction. The other criterion was that the predicted closing price should not deviate more than 10 % from the real closing price.

1

Chapter 2 Theory

2.1 Econometrics

The term "econometrics" is believed to have been coined by the Norwegian man Ragnar Frisch who lived between the years 1895-1973. He was one of the three principle founders of the Econometric Society, first editor of the journal Econometrica and co-writer of the first Nobel Memorial Prize in Economic Science in 1969[15]. Econometrics is used when it comes to applying statistical methods to problems when the data available is observational rather than experimental, meaning the data obtained does not come from controlled and planned experiments. Common fields where econometrics is applied are economics, biology, medicine, social science and astronomy[16]. The latter is a perfect example of a natural science where data are typically observational and not experimental.

2.1.1 The Multiple Linear Regression Model theory

The basic model for econometric work and modelling for experimental design is the multiple linear regression model[16]. The specification is

k

yi xij j ei , i 1,..., n j0

(2.1)

where yi is the observation of the dependent random variable y whose expected value depends on the covariates xCj where C is a constant that denotes that i does not change. ei represents the error terms and is assumed to be independent between observations and such that

E(ei |{xjk}) 0 and E(ei2 |{xjk }) 2

(2.2) (2.3)

where is unknown. Usually the covariate xC0 is a constant 1 and 0 is the intercept. If written

xi (xi0 ... xik ), i 1,..., n and (0 ... k )T then the specified model may be written as

yi xi ei

(2.4)

The covariates may be deterministic (predetermined) values or outcomes of random variables[16].

2

Sometimes it is convenient to use the matrix notation

Y X e where E(e | X ) 0 and E(eeT | X ) I 2

(2.5) (2.6)

where Y is a n1-matrix of random variables, X is an n(k 1) -matrix and e is an n1matrix of random variables.

In the regression model above the parameters j and the variance 2 are unknown and it is

these parameters that are to be estimated from obtained data. The model can be used for either prediction or it can be used to give a structural interpretation, which allows for hypotheses testing. Since the project aims at predicting shares' closing price the interesting part was therefore only prediction.

2.2 Prediction

When performing a prediction the linear model is often used[16]. The covariates x0 makes up a row matrix and with known covariates the predicted value of the corresponding y, yp , is

yp x0^ .

(2.7)

The prediction contains two unknown components; the estimated value ^ of is used instead of the real and the error term, which is set to zero in the prediction equation.

However, the error term is never zero in reality so to calculate it in the prediction the following equation is used

ep e0 x0 ( ^)

(2.8)

whose total variance is

Var(ep ) (1 x0 ( X T X )1 x0T ) 2

(2.9)

which is estimated to

V^ar(ep ) (1 x0 ( X T X )1 x0T )s2

(2.10)

where s is an unbiased estimate of 2

s2 1 | e^ |2 n k 1

(2.11)

3

where n is the number of observations, k is the number of covariates in the prediction model and | e^ |2 e^Te^, e^ Y X ^ .

2.3 Regression channels

On today's stock exchange one of the most common analysis tools is the regression channel. It uses historic values to forecast the future. The regression channel is based on a form of chaos theory i.e. trying to predict something that springs from total chaos. A metaphoric example can be made to illustrate how the regression channel works. Imagine a cigarette, which stands straight up in a room where the air is perfectly still. The chaos theory says that there is no way of predicting the smoke's trails and loops as it leaves the cigarette. However, if 10.000 cigarettes were to be observed in a row it would be noticed that the smoke trails of one cigarette would never behave the same way as another cigarette's, but at the same time it would also be noticed that the smoke trails would never move outside a conical boundary on their way up in the air. In chaos theory this boundary is known as a chaotic attractor.

Regression channels are based on the same principle, but instead of the smoke trail they use a share's closing price and the channel's boundary is the chaotic attractor, which the share price is not allowed to cross for a longer period of time. If the share moves outside the regression channel it indicates that an unforeseen event has occurred, such as positive or negative news or a new price target has been released and it is time to sell or buy the share.

One of the most common regression channels in use today is the Raff Regression Channel. It uses time and closing price to draw up the channel. A regression line is created by analysing a share's closing price between certain days, say for example 100 days. Once the regression line is drawn two more parallel lines are drawn, one above the regression line and one below it, at equal distance from the regression line, see Figure 2.1. The distance is determined by the highest or lowest share closing price from the regression line during the 100 days analysed[12]. The top line is seen as resistance and the bottom line is seen as support. The share may cross these two lines for a short moment but if it stays outside for a longer period of time it indicates that a new trend is coming.

4

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download