FORECASTING .ke
FORECASTING
INTRODUC TION
Forecasting is prediction of events which are to occur in the future and there are many variables to be predicted in an organization e.g.
- Demand /sales
- Costs
- Personnel requirements
- Future technology etc.
The need for forecasting arises from the fact that decisions are about the future which in many cases in uncertain. Thus forecasting can be viewed as an effort to reduce uncertainty so as to make more accurate decisions. This gives the firm the competitive advantage in the market.
QUALITATIVE AND QUANTITATIVE TECHNIQUES
A convenient classification of forecasting techniques is between those that are broadly qualitative and those that are broadly quantitative. These classifications are by no means rigid or exclusive but serve
as a means of identification.
Quantitative Techniques
These are techniques of varying levels of statistical complexity which are based on analyzing past data
of the item to be forecast e.g. sales figures, stores issues, costs incurred. However sophisticated the technique used, there is the underlying assumption that past patterns will provide some guidance to the future. Clearly for many operational items (material usage, sales of existing products, costs) the past does serve as a guide to the future, but there are circumstances for which no data are available, e.g.
the launching of a completely new product, where other, more qualitative techniques are required.
These techniques are dealt with briefly first and then the detailed quantitative material follows.
Qualitative Techniques
These are techniques which are used when data are scarce, e.g. the first introduction of a new product. The techniques use human judgment and experience to turn qualitative information into quantitative estimates. Although qualitative techniques are used for both short and long term purposes, their use becomes of increasing importance as the time scale of the forecast lengthens. Even when past data are available (so that standard quantitative techniques can be used), longer, term forecasts require judgment, intuition, experience, flair etc, that is, qualitative factors, to make them more useful. As the time scale lengthens, past patterns become less and less useful meaningful. The qualitative methods briefly dealt with next are the Delphi Method, Market Research and Historical Analogy.
Delphi Method
This is a technique mainly used for longer term forecasting, designed to obtain expert consensus for a particular forecast without the problem of submitting to pressure to conform to a majority view. The procedure is that a panel of experts independently answer a sequence of questionnaires in which the responses to one questionnaire are use to produce the next questionnaire. Thus any information available to some experts and not others is passed on to all, so that their subsequent judgments are refined as more information and experience become available.
Market Research/Market Survey
Widely used procedures involving opinion surveys, analyses of market data, questionnaires designed to gauge the reaction of the market to a particular products’, design, price, colour etc. Market research is often very accurate for the relatively short term, but longer term forecasts based purely on surveys are likely to be suspect because peoples’ attitudes and intentions change.
Historical Analogy
Where past data on a particular item are not available, e.g. for a new product, data on similar products are analysed to establish the life cycle and expected sales of the new product.
Clearly, considerable care is needed in using analogies which relate to different products in different time periods, but such techniques may be useful in forming a broad impression in the medium to long term.
Note on Qualitative Methods
a) When past quantitative data are unavailable for the item to be forecast, then inevitably much more judgment is involved in making forecasts.
b) Some of the qualitative techniques mentioned above use advanced statistical techniques, e.g. some of the sampling methods use in Market Research. Nevertheless, any such method may prove to be a relatively poor forecaster, purely due to the lack of appropriate quantitative data relating to the factor being forecast.
CORRELATION ANALYSIS
Before we study the quantitative forecasting techniques, it is necessary to study correlation, which among others issues, is a way of selecting predictor variables in some priority fashion.
Correlation analysis is concerned with finding if there is a relationship between any 2 variables, say X and Y e.g.
- Level of advertising and sales (+ve)
- Interest rates and savings (+ve)
- Prices and demand for a good etc (-ve)
It seeks to answer 2 fundamental questions in forecasting:
1. What nature, if any, of the relationship between 2 variables i.e in what direction is it?
–ve? or +ve?
2. What is the strength or the degree of the correlation between 2 variables? This enables ranking of predictor or independent variables to be carried out.
Methods of correlation
1. Scatter diagram.
2. Pearsonian co-efficient of correlation ([pic])
3. Spearmans rank correlation coefficient , ([pic])
Scatter diagram
Also called a scatter graph, this is obtained by plotting values of X and Y on a graph. We have X (independent variable) on the horizontal axis and Y (dependant variable) on the vertical axis.
Example
The output X (in units) and the associated TC, Y for a product over 10 weeks are as follows, Y is in millions of shillings.
|N0. |X |Y |
|1 |15 |180 |
|2 |12 |140 |
|3 |20 |230 |
|4 |17 |190 |
|5 |12 |160 |
|6 |25 |300 |
|7 |22 |270 |
|8 |9 |110 |
|9 |18 |240 |
|10 |30 |320 |
Issues of concern
1. Is there a relationship between X and Y?
2. If there is a relationship between X and Y can we obtain it in equation form so that if it persists into the future, we can use X to predict Y?
3. Can we assess the reliability and validity of the prediction model and the predictions it gives?
4.
Scatter diagram
Y
350 *
300 *
250 * *
200 * *
150 * *
100 *
X
5 10 15 20 25 30 35
Importance of scatter diagram
1. It gives a measure of approximate correlation. In the example, generally, as X increases, Y also
increases (+ve correlation.) It is also a strong correlation.
2. It suggests the best mathematical or functional form for X and Y. In this example, a linear fit seems the best.
Other possibilities (scatter diagrams)
Y 1) Strong quadratic correlation
x = Outlier
x - outlier
0 X
2) Weak linear –ve correlation
Y
X
Y 3) No correlation
X
3. A further usefulness of a scatter diagram is that it helps in identifying and hence weeding out outlier cases i.e. cases which are unlikely to recur e.g.
- Production when there was a strike
- Sales when there was a riot or a celebration.
Weaknesses of a scatter diagram
- It cannot be used to represent two or more independent variables simultaneous due to the
limitation of the cartesian plane.
Pearsonian Coefficient of Correlation (Product Moment - Coefficient of Correlation)
[pic] measures the nature and strength of the relationship between 2 variables on a scale from -1 to +1 such that:
If [pic] = -1 perfect indirect (inverse) correlation
[pic] = +1 perfect direct correlation
[pic] = 0 no correlation; these are variables which have no relationship with one
another.
Formula
r = [pic]Cov (x, y)
[pic]
= [pic][pic][pic]
Cov = Covariance
Calculations for [pic]
|No |X |Y |x -[pic] |[pic] |[pic] [pic] |
|1 |15 |180 |-3 |-34 |102 |
|2 |12 |140 |-6 |-74 | |
|3 |20 |230 |2 |16 |444 |
|4 |17 |190 |-1 |-24 | |
|5 |12 |160 |-6 |-54 |32 |
|6 |25 |300 |7 |86 |24 |
|7 |22 |270 |4 |56 |324 |
|8 |9 |110 |-9 |-104 | |
|9 |18 |240 |0 |26 |602 |
|10 |30 |320 |12 |106 |224 |
| | | | | |936 |
| | | | | |0 |
| | | | | |1272 |
|[pic] |180 |2140 |0 |0 |3960 |
[pic] = [pic] = [pic] = 18 [pic] [pic][pic] = 214
Cov (x, y) = [pic] = 396
Interpretation – There is a +ve association between output, X and TC ,Y
Limitation – Co-variance is not rigidly defined i.e. it can take any value making interpretation and comparison difficult e.g. ranking of predictors is impossible.
Remedy
By dividing the Cov by the product of the standard deviations of x and y, we obtain a statistic which is strictly in the range -1 to +1. This is the Pearsonian coefficient of correlation (due to the German mathematician, Karl Pearson).
[pic] [pic]
[pic]
[pic]
[pic] = 6.13
[pic]
[pic] = 66.06 r = [pic] = 0 .98
Interpretation - There is a very strong +ve correlation between X (output) and Y (TC). Hence X is a very good predictor of Y.
Simplified Formula for rp
[pic]
Given that the n in the numerator and the ones in the denominator will cancel out, we can have a simplified formula thus:
[pic][pic]
Comments
1) r is unitless, thus it is possible to compare correlation for any types of units.
2) Since r is strictly from -1 to +1, the scale of the data is immaterial e.g. Suppose X is in Kgs and Y in Shs.
[pic] = [pic] = Unitless
3) Maxim/Adage – High correlation does not mean causation but causation certainly implies high correlation. The fact that X and Y have high correlation does not mean changes in X cause changes in Y because it is possible to get a high correlation purely by chance (also known as a spurious or a non-sensical correlation)
- However, sometimes, a seemingly spurious correlation may be explainable through an
intervening variable e.g.
X Y Z = “causes changes in”
Hence X Z so that we expect a high correlation between X and Z, although it is not direct.
Y is the intervening variable e.g. alcohol driver accident.
4) Ranking of predictor variables.
Example
|Predictor |[pic] |Rank |
| | | |
|X1 |-0.99 |(1) |
| | | |
|X2 |0.93 |(2) |
| | | |
|X3 |-0.40 |(4) |
| | | |
|X4 |0.52 |(3) |
NB: When ranking predictor or independent variables, the direction of the correlation is immaterial. What only matters is the size or the magnitude of the correlation.
- The pearsonian coefficient of correlation is also called Product moment coefficient of correlation.
SPEARMANS COEFFICENT OF RANK CORRELATION, [pic]
This coefficient of correlation was developed by the British psychologist, Charles Edward Spearman in 1904. It is most suitable when data is of ordinal nature i.e. where only ranking is possible.
Types of data
1) Nominal – These are labels e.g. car no’s in a safari rally.
2) Ordinal – e.g. ranking of school plays or beauty contest candidates.
3) Interval – e.g. temperature scale – has no natural zero (it is a arbitrary)
4) Ratio – e.g. marks scored in an exam or the age of a person – has natural zero.
Formula
[pic] = [pic] where [pic] = difference between the pairs of ranked values
n = no. of pairs of rankings
Like the product – moment coefficient of correlation, [pic] can take values from –1 to +1.
Example
Two managers are asked to rank a group of employees in order of potential for promotion.
The rankings are as follows:
|Employee |Manager |Manager |(R1-R2)2 = [pic] |
| |1 (R1) |2 (R2) | |
|A |10 |9 |1 |
|B |2 |4 |4 |
|C |1 |2 |1 |
|D |4 |3 |1 |
|E |3 |1 |4 |
|F |6 |5 |1 |
|G |5 |6 |1 |
|H |8 |8 |0 |
|I |7 |7 |0 |
|J |9 |10 |_1_ |
| | | [pic]= |14 |
Required:
Compute the coefficient of rank correlation and comment on the value.
Solution
[pic] = [pic] [pic] [pic] = [pic] = [pic]
[pic] = 0.92
Comment
There is a very high +ve correlation between the ranking of the 2 managers, meaning they largely agree.
[pic] for tied ranking
The formula is modified to be:
More than one tie
[pic]
where t = number of tied rankings
Example on tied rankings - More than one tie
Two judges in a schools drama festival awarded marks to eight schools in order of preference. From these, determine the Spearmans coefficient of rank correlation rs. Thus, comment on whether the two judges generally agree on the schools performance or not.
|School |Judge 1 |Judge 2 |
| | | |
|A |72 |69 |
|B |79 |83 |
|C |77 |81 |
|D |77 |69 |
|E |73 |68 |
|F |71 |66 |
|G |77 |65 |
|H |70 |81 |
Solution
|Ranking the schools |Average ts and calculation of d2 |
|School |R1 |R2 |School |R1 |R2 |d |
|A |6 |4 |A |6 |4.5 |2.25 |
|B |1 |1 |B |1 |1 |0 |
|C |2 |2 |C |3 |2.5 |0.25 |
|D |2 |4 |D |3 |4.5 |2.25 |
|E |5 |6 |E |5 |6 |1 |
|F |7 |7 |F |7 |7 |0 |
|G |2 |8 |G |3 |8 |25 |
|H |8 |2 |H |8 |25 |30.25 |
| | | | | | |[pic] |
[pic][pic]
[pic]
[pic]
[pic] [pic]
[pic] 1 - [pic] = [pic] = 0.24
Comment
There is a very low +ve correlation between the marks awarded to the schools by the two judges, meaning that generally, they do not agree on the schools’ performances.
Example
The scores of 8 Sec. 4 students in QA and Tax tests where as follows:
| |Score % |
|Student |QA |Tax |
|I |(2) 70 | (3) 72 |
|II |(7) 41 |(6) 54 |
|III |(6) 45 |(4) 66 |
|IV |(1) 83 |(2) 80 |
|V |(3) 65 |(5) 62 |
|VI |(3) 65 |(1) 89 |
|VII |(5) 59 |(8) 45 |
|VIII |(8) 39 |(7) 50 |
|( ) [pic] Ranking |
Required:
Do those students who score highly in QA also do so in Tax or is it vice versa?
Use spearman’s rank correlation coefficient , [pic]
Solution
The tied rankings are added and divided by their no.
| |Ranks | |
| Student | QA | Tax |[pic] |
|I |2 |3 |1 |
|II |7 |6 |1 |
|III |6 |4 |4 |
|IV |1 |2 |1 |
|V |3.5 |5 |2.25 |
|VI |3.5 |1 |6.25 |
|VII |5 |8 |9 |
|VIII |8 |7 |_1___ |
| | | |[pic] |
[pic]= [pic] [pic] t = 2 (no. of tied rankings)
[pic]= [pic] [pic] = [pic]
[pic] = [pic] [pic] = 0.69
Comment
There is a fairly high +ve correlation between scores in QA and those in Tax. Hence, generally, a student who does well in QA also does well in Tax and vice versa.
NB: 1) If data is of ordinal scale, we can only calculate Spearman’s rank correlation
coefficient, [pic].
2) For interval and ratio data, both Pearsonian and Spearman’s coefficients can be calculated although it is preferable to use Pearsonian whenever possible. Hence in such a circumstance (interval and ratio data), whenever the correlation coefficient is not specified, use Pearsonian coefficient e.g. in exams.
QUANTITATIVE FORECASTING TECHNIQUES
Causal methods
These are methods whereby changes in one or more variable(s) cause changes in another variable.
e.g. demand being a function of price of product, prices of substitutes and complements, income etc.
The techniques include: Graphical methods (Hi-low and visual fit), Regression analysis and Econometric methods.
Relationship between variables
There is a relationship between variables when changes in one factor appears to be related in some way to movements in one or several other factors. e.g.
i) A marketing manager may observe that sales change when there has been a change in advertising expenditure.
ii) The transport manager may notice that vans and lorries consume more fuel when they travel longer distances.
Questions which arise to the manager or the analyst.
1. Are the movements in the same or opposite directions?
Same direction means +ve correlation
Opposite direction means -ve correlation
2. Would movements in one variable be causing or being caused by movements in another variable? i.e. is there a cause and effect relationship?
Note: High correlation does not mean causation but causation certainly implies high correlation. It is possible to have high correlation purely by chance (also called a spurious or a nonsensical
correlation).
3. Would apparently related movements come purely by chance?
4. Would movements in one variable be a result of combined movements in several other variables?
5. Would movements in 2 factors be related not directly but through movements in a 3rd variable (an intervening variable)?
Frequently the manager or analyst is interested in prediction of some kind e.g. marketing manager is interested in factors affecting sales.
GRAPHICAL TECHNIQUES
1. High-low method
For this method, we obtain a straight line connecting the highest and the lowest values of the observations according to the predictor variable x
Example.
Data on output X & TC (Y)
|No. |X |Y |
|1 |15 |180 |
|2 |12 |140 |
|3 |20 |230 |
|4 |17 |190 |
|5 |12 |160 |
|6 |25 |300 |
|7 |22 |270 |
|8 |9 |110 |
|9 |18 |240 |
|10 |30 |320 |
Hi- point X = 30 Y = 320
Low-point X = 9 Y = 110
Objective
Calculate a and b to fit the linear function y = a + bX
Equations
1) 320 = a + 30b Substitute
2) 110 = a + 9b 320 = a + 3(10
(1)-(2) = (3) 210 = 21b 320 – 300 = a
a = 20 ……….. fixed cost
b = 210 b = 10…………..unit variable cost
21
Hence the cost prediction function is
Y = 20 + 10x
Strength
Hi – low method is simple to use and to explain to non-mathematical users.
Weaknesses
1) Assumes the relationship is linear.
2) Ignores all points except 2 of them.
Outlier cases can have serious distortion effects if not excluded.
The line does not fall in the general pastern
x
x x x x High point
outlier
x x x
x x
x low point
3) For Hi-low method, we cannot measure the size of probable error concerning model parameter estimates and predictions made using the model (we cannot make
probability statements concerning these values).
- Probability statements are confidence interval construction and hypothesis tests.
ŷ = a + bx …………………..model
Y = A + Bx
Or Y = [pic]+ βx Reality
2. Visual fit method
This method utilizes a scatter diagram whereby the analyst uses her/his judgment to obtain the best fit by considering the general pattern for all observations.
Y
x x
x x
x
x x
x
x
x x outlier ignored
0 X
Strengths
1) Improves on the Hi-Low method in the sense that it consider all observations
simultaneously, thereby ignoring outliers.
2) Does not assume linearity unlike the high-low method
Y
x x
x x x x
x x x x
x x x x
x x x x x x x x
x x x x x x x
0 X
Weaknesses
1. Size of probable error cannot be measured.
2. It is subjective to some extent – the less clearer the pattern, the greater this subjectivity is.
3. Assumes a single predictor variable
REGRESSION ANALYSIS
This is a technique which uses a statistical model to measure the average amount of change in one variable, y (the dependent variable) that is associated with unit changes in one or more independent or predictor variables,
x1, x2 -------- xn
When certain conditions or assumptions holds, it is possible to measure the size of probable error.
Types
1. Simple regression - is concerned with a single independent variable
2. Multiple regression - is concerned with 2 or more independent variables.
SIMPLE REGRESSION ANALYSIS (LINEAR FUNCTION)
For simple regression analysis, we are interested in 3 issues:
1) Obtaining parameter estimates a and b for the linear function.
y = a + b[pic] given a set of pairs of observations of[pic] and y.
2) Evaluation of the regression: equation – how good is it as a prediction model?
3) What assumptions are made in regression analysis and how well do they hold in practice for
the model in question?
Derivation of the Regression Equation Or
Line Of Best Fit or Ordinary Least Square (OLS)Line
Consider the following scatter diagram:
y ŷ = a + bx
y ŷ Let
ŷ = predicted value
error
error y = actual or historical or observed value
ŷ y
0 [pic] [pic] [pic]
The line of best fit is obtained by minimizing the sum of square error, SSE = Σ e 2 = Σ (y – ŷ)2
We use differentiation
SSE = Σ {y - (a + bx)}2
SSE = Σ {y - a - bx)2
Objective: To find values of a and b which minimize SSE
FOC:
[pic] = 2 Σ (y - a - b[pic])[pic]-1 = 0
Σ (y - a - b[pic]) = [pic]
Σ (y - a - b[pic] = 0
Σ y - Σa - Σ b[pic] = 0
Σ y= Σa + Σ b[pic]
But Σ a = [pic] and Σ b[pic] = b Σ[pic]
Thus: --------- normal equation (1)
[pic] = 2 Σ (y - a - b[pic]) x -[pic] = 0
x Σ (y - a - b[pic]) =[pic]
x Σ (y - a - b[pic]) = 0
Σ[pic]y - Σ a[pic] - Σ b[pic]2 = 0
Σ[pic]y = Σ a[pic] +Σ b[pic]2
Thus: ---------- normal equation (2)
SOC: [pic]
[pic]
[pic]Since both the second derivatives are positive, the turning point is minimum
Thus by obtaining the appropriate Σs, we substitute in the normal equations and solve simultaneously to obtain values of a and b.
A way of remembering the normal equations
Consider the linear function:
Y = a + b[pic]
Normal equation 1
Take Σs thus: Σ y = Σ a + Σ b[pic]
Then: Σ y = na + b Σ[pic]
Normal equation 2
Multiply the linear function by [pic]and take Σs thus:
Σ [pic]y = Σ a[pic] + Σ b[pic]2
Hence: Σ [pic]y = a Σ [pic]+ b Σ [pic]2
Example
Determine the OLS line for the data on output Vs TC and interpret a and b
OLS calculations
|N0. |[pic] |Y |[pic]y |[pic]2 |
|1 |15 |180 |2700 |225 |
|2 |12 |140 |1680 |144 |
|3 |20 |230 |4600 |400 |
|4 |17 |190 |3230 |289 |
|5 |12 |160 |1920 |144 |
|6 |25 |300 |7500 |625 |
|7 |22 |270 |5940 |484 |
|8 |9 |110 |990 |81 |
|9 |18 |240 |4320 |324 |
|10 |30 |320 |9600 |900 |
| [pic] |180 |2140 |42,480 |3616 |
| |Σ [pic] |Σ y |Σ [pic]y |Σ [pic]2 |
The normal equations
Σ y = Σ[pic] + b Σ[pic]
Σ [pic]y = a Σ[pic]+ b Σ[pic]2
Substitution
2140 = 10a + 180 b
42480 = 180 a + 3616 b
Rearranging:
10 a + 180 b = 2140
180 a + 3616 b = 42480
Matrix format Use of Cramers Rule
[pic] [pic] = [pic] a = [pic] = [pic]
a = [pic] = Sh.24.42 fixed cost
b = [pic][pic] = [pic] = Sh. 10.53 variable cost per unit
[pic] ŷ = 24.43 + 10.53[pic]
Caution on Relevant Range
The prediction model is most dependable i.e. most accurate within the relevant
range. This is the span of activity or volume which encompasses all the observation used to develop the model.
y
0 9 30 [pic]
Relevant Range
Extrapolation outside the relevant range may give inaccurate predictions for 2 reasons:
1) The relationship outside the relevant range may be different from that within the relevant range.
2) There may not even be a relationship between [pic] and y outside the relevant range.
EVALUATION OF REGRESSION MODELS
The major concern for any model is how reliable or valid it is for what it is meant to do.
There are 3 aspects in the evaluation of a regression model.
1) Economic plausibility
2) Goodness of fit
3) Specification analysis (validity of regression assumptions)
1. Economic Plausibility
There should exist a logically explainable relationship between a dependent variable and the independent variable. This includes the theoretical expectation (or the direction) of the coefficient
i.e. the sign.
+ve -ve +ve
E.g. Savings = f {income, household sizes Interest rates --------}
Justification of predictor variables is through research, where the researcher begins by reviewing
the relevant literature.
2. Goodness of Fit Tests
These measure how well the regression equation fits the data set from which it was derived.
They are 2 broad types: those for
i) Whole (full) model (macro level)
ii) The slope (micro level)
i) The full model
These measure how well all predictors taken together predict the dependent variable and there are 3 types:
i) Coefficient of determination, r2 or R2
ii) Standard error of estimate, Se -------------------- also called standard error of regression
iii) F statistic
ii) The slope ( b coefficient)
These measure how well a particular independent variable predicts the dependent variable if all other predictors are not included or are assumed constant.
The measures
i) Coefficient of collelation, r
ii) Standard error of the slope, Sb
iii) t or z statistic
Note that the measures for the full model for a given regression are only one each while for the slope, there are as many as the independent variables as depicted below.
ŷ = a + b1 [pic]1 + b2 [pic]2 + --------------
R2 r1 r2
or r2
Se Sb1 Sb2
F t1 t2
or z1 or z2
i) Coefficient of Determination, r2
r2 measures the amount of variation in the dependent variable, y which is explained by
the variation in the independent variable (s) x1, x2 --------- on a scale from 0% to 100%
i.e. 0 ≤ r2< 100%
Generally, the higher the value of r2 the better the prediction model. Thus, r2 is used to rank prediction models.
For simple regression i.e. single predictor variable [pic]
r = product - moment (Pearsonian) coefficient of correlation
For multiple regression, there is no connection between R2 and individual values of r.
Calculations for r2
Consider the following scatter diagram:
ŷ = a + b[pic]
y
ŷ
[pic]
0 [pic] X
Standard symbols
y = actual/observed /historical value
ŷ = predicted value
[pic]= average (mean) value = [pic]
Sum of squares (SS)
[pic] = Total SS (SST)
[pic] = SS due to error (SSE). Also called unexplained SS
[pic] = SS due to Regression (SSR)}. Also called explained SS
Explained SS + unexplained SS = Total SS
Using symbols:
[pic]
These SS can take any positive value, so they are not useful in terms of comparison of equations
or models.
To achieve this, we force the RHS to equal one by dividing every term by SST thus:
[pic] = [pic]
This translated to:
Explained + unexplained = 1
variance variance
Thus:
Explained variance = 1 - unexplained variance OR r2 = 1 - [pic] Also, [pic]
Example:
Determine r2 for the data on output, X vs TC, Y and interpret it.
Table for Calculations of r2
|No |x |y |[pic] |[pic] |[pic] |
|1 |15 |180 |182.38 |1156 |5.6644 |
|2 |12 |140 |150.79 |5476 |116.4241 |
|3 |20 |230 |235.03 |256 |25.3009 |
|4 |17 |190 |203.44 |576 |180.6336 |
|5 |12 |160 |150.79 |2916 |84.8241 |
|6 |25 |300 |287.68 |7396 |151.7824 |
|7 |22 |270 |256.09 |3136 |193.4881 |
|8 |9 |110 |119.2 |10816 |84.64 |
|9 |18 |240 |213.97 |676 |677.5609 |
|10 |30 |320 |340.33 |11236 |413.3089 |
| |∑x = 180 |∑y = 2140 | |SST = 43640 |SSE = 1933.6274 |
Recall that the OLS equation is:
[pic]= 24.43 + 10.53x
[pic] = [pic] = [pic]
r2 = [pic] = 0.9557 = 96%
Interpretation
There is no hard and fast rule on when to say a regression model is good or otherwise. There is however, a rule of thumb used by practitioners as follows:
Rule of Thumb
For r2 : 75% - 100% = Very good model
60% - 74% = Good model
50% - 59% = Satisfactory model
Below 50% = Poor model
Thus, 96% of the variance in y (TC) is explained by the variation in [pic] (output) and according to the rule of thumb, this is a very good prediction model.
Unexplained variance = 1 - r2 = 1 - 0.96 = 4%
This is accounted for by two factors:
1) possible predictor/explanatory variables not included in the model.
2) Pure chance factors: Even when all explanatory variables have been included in a model there will still be a disturbance term which cannot be completely eliminated e.g. measurement error.
Thus:
Ŷ = a + b1x1 + b2 x2x-------- + e
r2 1 - r2
Regression (residuals)
Other predictors pure chance factors
ii) Standard Error of Estimate, Se
Also called standard error of regression, Se is a measure of dispersion or spread of a predicted value of y and it is calculated thus;
Se = [pic]
when n = sample size
k = no. of parameters estimated
n-k = no. of degrees of freedom, v
For the example:
SSE = 1933.6274
Sample size, n = 10
k = 2 ---------- these are the intercept and the slope.
n - k = 10 - 2 = 8
[pic]
Se = Sh.15.55
Interpretation
Generally, the smaller the value of Se (or any other error), the better the model. Se can also be used to construct a confidence interval on a predicted value of y thus:
C I = ŷ + t or z x Se
Example:
For output, x = 31 units, determine the forecast for TC, y and calculate its 95% C I. Calculations to the nearest shilling.
Solution:
For x = 31, y = 24.43 + 10.53(31)
= Sh. 351
n = 10 (small sample)
Hence use t statistic with df = 8
For [pic]= 5%, t critical = 2.31
95% C I = 351 ± 231 × 15.55
= 351 ± 36
OR 315 < y < 387
iii) F Statistic
Named in honour of the British statistician, Sir Ronald Fischer, this is a statistic which essentially compares SSR to SSE.
Further, it enables a hypothesis test to be carried out on the significance of the regression model.
Formula: F = [pic] OR F = [pic]
where k - 1 = v1 = numerator df
n - k = v2 = denominator df
Some characteristics: F ≥ 0 and it is closely related the chi-square x2 distribution. It is skewed to the right.
P(F)
0 F
Generally, the higher the value of F the better the predictor model.
For the example:
F = [pic] = [pic]
F = [pic] = 173 OR [pic] [pic][pic]
HT on Significance of Regression Model
1. Hypotheses
Ho: Regression model is not statistically significant (not suitable)
H1 : Regression model is statistically significant (suitable)
2. Level of significance
Let [pic]= 5%
3. Test statistic
[pic] [pic]
4. Decision Criteria
Numerator degrees of freedom v1 = k-1 = 2 - 1 = 1
Denominator d f, v2 = n-k 10 - 2 = 8 [pic] F critical = 5.32
[pic]5%
P (F) 95% (fail to reject Ho)
5% (reject Ho)
0 5.32 F
Rule: Reject Ho if F calculated is > 5.32, otherwise fail to reject it.
5. Data Analysis/Calculations
F calculated = 173
6. Conclusions
a) Statistical conclusion
Reject Ho since F calculated = 173 is greater than F critical of 5.32
b) Management decision
The model is statistically significant and hence it is a suitable prediction model.
Goodness of Fit Measures for the Slope
i) Coefficient of Correlation
This is already covered in correlation analysis topic earlier
ii) Standard Error of the Slope, Sb
Sb is a measure of dispersion of the estimate of the slope and enables statistical inferences to be carried our regarding the estimate of the slope
Formula
where Se = standard error of regression or standard error of estimate whereas [pic]= the dispersion of x about its mean.
For the example:
Se = Sh 15.55
[pic]
|No |[pic] |[pic] |[pic] |
|1 |15 |-3 |9 |
|2 |12 |-6 |36 |
|3 |20 |2 |4 |
|4 |17 |-1 |1 |
|5 |12 |-6 |36 |
|6 |25 |7 |49 |
|7 |22 |4 |16 |
|8 |9 |-9 |81 |
|9 |18 |0 |0 |
|10 |30 |12 |144 |
| |[pic][pic] |- |[pic] |
From the table, we find that [pic]
Thus: [pic] = Sh 0.80
Interpretation
Generally, the lower the value of Sb, the better the predictor variable concerned.
Sb can be used to obtain an interval estimate of the value of the slope thus:
Example:
Obtain the 95% C I on the estimate of the slope. Workings to 2 dps.
Solution:
b = Sh.10.53 t = 2.31
[pic] 95% C I = 10.53 [pic]
= 10.53 ± 1.85
OR 8.68 < B < 12.38
Comment - the fact that the interval does not include zero means that under the null
hypotheses: B = 0,
we would reject Ho and conclude that this is a significant slope so that this is a suitable predictor variable of total cost.
iii) Z or t statistic
This is a statistic which is sufficiently powerful to enable a HT to be carried out regarding the
estimate of the slope.
Z or t = [pic]
For the example:
b = Sh. 10.53 Sb = Sh.0.80
[pic] = 13.1625
Comment - the higher the value of t in absolute terms (i.e. ignore the sign), the better the predictor variable concerned
Hypothesis test on significance of the slope
1) Hypothesis
H: B = 0 ------- slope is not statistically significant i.e. X (output) is not a suitable predictor of Y (TC)
Hi B≠ 0 --------- slope is statistically significant i.e. X is a suitable predictor of Y.
2) Level of significance
Let [pic]= 5%
3) Test statistic
Use t test since sample size is small [pic]
t = [pic]
4) Decision criteria
For [pic]= 5% and d f [pic]
95% (fail to reject Ho)
t critical = 2.31
2 ½ (reject Ho)
2 ½ (reject Ho)
-2.31 +2.31
Rule: Reject Ho if t calc is greater than t critical
i.e. t calc > + 2.31 or t calc < - 2.31
5) Data Analysis/Calculations
t calc = 13.1625
6) Conclusions
a) Statistical conclusion
Reject Ho since t calc = 13.1625 is greater than
t critical = + 2.31
b) Managerial decision
Output, X is a significant or a suitable predictor of TC, Y
COMPUTER OUTPUT SOLUTION
Program: Forecasting/Simple Regression.
Problem Title: MBA – REG - EXAMPLE
* * * * * Input Data * * * * *
|Obs |Y |X |
|1 |180.000 |15.000 |
|2 |140.000 |12.000 |
|3 |230.000 |20.000 |
|5 |190.000 |17.000 |
|4 |160.000 |12.000 |
|6 |300.000 |25.000 |
|7 |270.000 |22.000 |
|8 |110.000 | 9.000 |
|9 |240.000 |18.000 |
|10 |320.000 |30.000 |
* * * * * Program Output * * * * *
|Parameter |Coefficient |SE B |T |
|Intercept |24.4255 |15.2462 | 1.6021 |
|b 1 |10.5319 |0.8018 |13.1359 |
Coefficient of determination : 0.9557
Correlation coefficient : 0.9776
Standard Error : 15.5468
Prediction Error
|Obs |Observed Value |Predicted Value |Residual |
|1 |180.000 |182.404 | -2.404 |
|2 |140.000 |150.809 |-10.809 |
|3 |230.000 |235.064 |-5.064 |
|4 |190.000 |203.468 |-13.468 |
|5 |160.000 |150.809 |9.191 |
|6 |300.000 |287.723 |12.277 |
|7 |270.000 |256.128 |13.872 |
|8 |110.000 |119.213 |-9.213 |
|9 |240.000 |214.000 |26.000 |
|10 |320.000 |340.383 |-20.383 |
Mean Absolute Deviation (MAD) : 13.6312
ANOVA Table
|Source of Variation |SS |df |MS |
|Regression | 41706.383 |1 |41706.383 |
|Residual |1933.617 |8 |241.702 |
Total 43640.000 9
F* = 172.553
VALIDITY OF REGRESSION ASSUMPTIONS
Much as regression analysis is a powerful prediction tool, the model would not be valid if the assumptions or conditions or specifications underling the technique are violated; hence the assessment of these assumptions is imperative.
The assumptions:
1. Linearity - the mathematical relationships between the variables are assumed to be of
the 1st degree i.e. linear.
2. Error Term (residual) is assumed to be
a) Normally distributed with
b) Zero mean and
c) Constant variance (i.e. homoscedastic)
3. We assume that the observations are independent i.e. they do not influence one another (e.g. the way time series observations do), i.e. there is no serial correlation (also called autocorrelation.
4. We assume that there is no statistically significant relationship among the independent or the predictor variables i.e. there is no multicollinearity.
Assessment of the Assumptions.
1. LINEARITY
This is assessed through the use of a scatter diagram for a single independent variable. However, a scatter diagram cannot be used for two or more predictors and where the pattern is not very clear. Thus, the more general or mathematical method of testing for any functional form is to use a measure of forecast error such as the sum of square error (SSE).
SSE = ∑e2 = ∑(y-ŷ)2
We select the functional/mathematical form that minimizes SSE
Example
Suppose four functional forms are fitted and their SSEs are calculated as follows:
Functional form SSE
1 Linear 510
2 Quadratic 503
3 Cubic 412
4 Exponential 692
Conclusion
Out of the 4 fits, the cubic one is the best since it has the least SSE
2. ERROR TERM, [pic]
The following table provides the workings for the error term for the ongoing example.
Workings for the Error Term
|Obs no. |y |[pic] |e |
| 1 |180 |182.38 |-2.38 |
|2 |140 |150.79 |-10.79 |
|3 |230 |235.03 |-5.03 |
|4 |190 |203.44 |-13.44 |
|5 |160 |150.79 |9.21 |
|6 |300 |287.68 |12.32 |
|7 |270 |256.09 |13.91 |
|8 |110 |119.20 |-9.20 |
|9 |240 |213.97 |26.03 |
|10 |320 |340.33 |-20.33 |
| | | |[pic] |
a) Assessment for normality
Using a frequency diagram, plot e Vs its frequency after suitable classification, then observe the nature of distribution from the shape of the frequency diagram
| e Class |Frequency f |
|-29 -20 |1 |
|-19 -10 |2 |
|-9 0 |3 |
| 1 10 |1 |
|11 20 |2 |
|21 30 |1 |
Frequency diagram
f
* 3
*
2
* *
1 *
*
*
-30 -20 -10 0 10 20 30 e
Observation:
With the exception of class 0 10, the rest of the observations generally conform to normality. This is so inspite of the sample size being small (n=10). In order to discern the probability distribution of some variable, it is necessary to have a large sample size.
Mathematical method - To test for any goodness of fit in a more rigorous manner, we would need to perform a chi-square, [pic] goodness of fit test.
- If normality requirement is violated, probability statements on predictions and model parameter estimates are not valid.
In practice, there is no remedy for normality except to increase the sample size.
b) Assessment for zero mean for error term, e
Zero mean implies that the residuals (e) are random; they are not systematic i.e. [pic].
We test for this requirement using the z or t test for a single mean depending on sample size. Recall that the standard error for a single mean is given as:
[pic] = [pic]----------also known as the standard error of the mean
Hypothesis Test
1) Hypothesis: Ho: µe = 0 ------mean of error term is zero
H1: µe ≠ 0 -----mean of error term is not zero
2) Level of significance
Let [pic]= 5%
3) Test statistic - since sample size is small (n = 10), use t test with d.f;
v = n - 1 = 10 - 1 = 9. We lose one d.f. since we don’t know the population [pic]; we have to estimate it from the sample.
[pic]
where [pic] [pic] sample mean of the error term
µe [pic] mean of the error term according to the null hypothesis, Ho, which is zero.
4) Decision criteria
d.f = 9
[pic] 5% t critical = 2.262
2 tail test
95% (fail to reject Ho)
2.5% (reject Ho)
2.5% (reject Ho)
-2.262 2.262
Decision Rule:
Reject Ho if tcalc is less -2.262 or tcalc is greater +2.262, otherwise fail to reject Ho
5) Calculations
[pic][pic] [pic]
NB: These values, [pic]and [pic]can be determined directly from a scientific calculator.
[pic]
tcalc = 0.0065
6. Conclusions
i) Statistical conclusion: Since tcalc = 0.0065 is between -2.262 and +2.262, we fail to reject Ho
ii) Managerial decision: The error term has a mean equal to zero and any decision
made on the basis of this conclusion is supported by this study.
c) Assessment for constant variance.
Plotting Y vs X gives an indication of whether the error term has a constant or a changing variance as follows
y (1) y (2)
0 [pic] 0 [pic]
Constant variance (homoscedastic) Changing (increasing) variance
(heteroscedastic)
(3)
y
0
Changing variance due to non-linear form
Comments
1. Constant variance from observation to another implies that the size of the residuals is
not affected by a particular observation, X {case (1)}
2. Lack of constant variance means that the slopes (b coefficients) are unstable and
hence unreliable; they are dependent on the range of the observations {cases (2) and (3}.
3. Remedy for lack of constant variance is to remove the concerned or the affected
independent variable {case (2)} or fit a non-linear function {case (3)}.
3. INDEPENDENCE OF THE OBSERVATIONS
An important assumption in regression analysis is that the observations are independent of one another i.e. the value of a given observation is neither influenced by those values before it nor by those after it. We assume that the data for regression analysis is cross sectional and not longitudinal (time series data)
If the residuals are not independent, then we have serial or auto-correlation which in practice means one of 3 things:
a) The standard errors of the regression coefficients (Sbs) are seriously underestimated.
b) The predictions made from the regression equation will be more variable than is ordinarily expected from least squares estimation.
ŷ = a + b1 χ1 + b2 χ2 +--------------
highly variable; wider their Sbs are underestimated
than usual confidence intervals
Assessment for Autocorrelation
To test for autocorrelation (1st order serial correlation), we use the Durban - Watson d statistic
[pic] Where [pic] [pic] Error term for observation i or for point in time i
Decision Rule
For independent observations i.e. where there is no serial correlation, d is normally distributed with
the expected value (mean) being equal to [pic]= 2. Thus as a rule of thumb, for values of d between
d = 1 and d = 3, we conclude independence and outside this range we conclude that
there is serial correlation.
Independence
1 d=2 3
Presence of serial correlation
|SERIAL CORRELATION |
|Consider the sales of a firm in millions of shillings for seven consecutive months: |
| |
|No (i) |
|Yi |
| |
|1 |
|60 |
| |
|2 |
|65 |
| |
|3 |
|63 |
| |
|4 |
|69 |
| |
|5 |
|69 |
| |
|6 |
|68 |
| |
|7 |
|71 |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|We can have various serial correlations as follows: |
| |
|Yi-1 |
|Yi |
|Yi-2 |
|Yi |
|Yi-3 |
|Yi |
| |
|60 |
|65 |
|60 |
|63 |
|60 |
|69 |
| |
|65 |
|63 |
|65 |
|69 |
|65 |
|69 |
| |
|63 |
|69 |
|63 |
|69 |
|63 |
|68 |
| |
|69 |
|69 |
|69 |
|68 |
|69 |
|71 |
| |
|69 |
|68 |
|69 |
|71 |
|3rd order etc |
| |
|68 |
|71 |
|2nd order |
| |
| |
|1st order |
| |
| |
| |
| |
|We only test for 1st order serial correlation in regression analysis |
Workings for d
|No(i) |ei |ei-1 |ei - ei-1 |(ei - ei-1)2 |
|1 |-2.38 |- |- |- |
|2 |-10.79 |-2.38 |-8.41 |70.7281 |
|3 |-5.03 | -10.79 |5.76 |33.1776 |
|4 |-13.44 |-5.03 |-8.41 |70.7281 |
|5 |9.21 |-13.44 |22.65 |513.0225 |
| |12.32 |9.21 |3.11 |9.6721 |
|7 |13.91 |12.32 |1.59 |2.5281 |
|8 |-9.20 |13.91 |-23.11 |534.0721 |
|9 |26.03 |-9.20 |35.23 |1241.1529 |
|10 |-20.33 |26.03 |-46.36 |2149.2496 |
| | | | |∑=4624.3311 |
[pic] [pic][pic]ei2 = 1933.6274
[pic]
Conclusion
Since calculated d value is ≈ 2.39 falls in the interval 1- 3, we conclude that the observations are independent i.e. there is no serial correlation.
Note:
Where there is serial correlation, investigate the cause. In most cases, it will mean time series model would be more appropriate relative to regression method. In rare cases, non-linear model would be appropriate
4. COLINEARITY OR MULTICOLLINEARITY
In regression analysis, we assume that there is no statistically significant relationship between the independent variables. As such, this concept is only applicable where there are 2 or more independent variables, i.e. in multiple regression analysis.
y = f (x1, x2 ----------------------)
weak or no correlation
among the predictors
The effect of presence of muticollinearity is to distort the regression equation obtained, especially the slopes (b coefficients) so that they become unreliable - they would be underestimated. This would yield low t values, so that even if there is a significant relationship between an independent variable and the dependent variable, this may be erroneously negated.
In practice, presence of multicollinearity implies that one or more independent variables are
superflows i.e. not necessary. It is like using the same variable twice
Example
Suppose overhead cost is a function of machine hours and labour hours. Further, suppose most workers use machines so that there is high a correlation between machine hours and labour hours. In an overhead cost prediction model, it will not be necessary to use both machine hours and labour hours because what one of them explains, the other one will add very little explanation.
A possible scenario could be as follows:
Overhead cost = f (machine hours) ------------------------------------ r2 = 80% Overhead cost = f (labor hours) -------------------------------------- r2 = 84% Overhead cost = f (machine hours, labour hours) ---------------------- r2 = 89%
Thus, inclusion of both predictors does not improve r2 substantially.
Remedy for multicollinearity - drop one of the collinear variables; use a correlation matrix as shown next.
Example:
Consider 3 independent variables: x1, x2 and x3 together with the dependent variable Y. Assume they yield the following correlation matrix.
| |Y1 |X1 |X2 |X3 |
|Y |1 | | | |
|X1 |0.7 |1 | | |
|X2 |-0.9 |0.8 |1 | |
|X3 |0.1 |0.4 |-0.4 |1 |
Suppose we use the rule of thumb for collinearity of coefficient of correlation, r greater than
0.7 in absolute value, then we have the following results:
|Variables |r |Remark |
|X1 |Vs |X2 |0.8 |Collinear |
|X1 |Vs |X3 |0.4 |Not collinear |
|X2 |Vs |X3 |-0.4 |Not collinear |
Result: Since X1 and X2 are collinear, we need to remove one of them. This is X1 since it has
a weaker correlation to Y (r = 0.7) compared to X2 (r = -0.9) (X2 is retained).
MODEL VALIDATION: A CONTEMPORARY PERSPECTIVE
Contemporary model validation suggests the use of a hold-out sample to assess the model. The whole sample is divided into 2 portions.
Sample 1: Used for model development
Sample II: Used for model validation
This is in recognition of the fact that using the same data for both model development and validation
is likely to end up with unrealistically accurate results (self - confirming results)
Problem: In many practical situations, sample sizes are small so that dividing into 2 makes them
even smaller so that it may not be practical to do this.
MULTIPLE REGRESSION ANALYSIS
In many cases, satisfactory predications of a given response variable, Y is better determined by using more than one independent variable. In such cases, we need to extend the methods of simple regression analysis. Where we have two or more independent variables the analysis is referred to as multiple regression analysis.
General structure of the linear multiple regression model.
y = a + b1 X1 +b2 X2 + ---------bnXn
where a = y intercept i.e. value of y when Xjs = 0
Xj [pic] independent variable j (j = 1, 2, -------n)
Y [pic] the dependent variable
bj [pic] the coefficient (slope) of Xj
Problem: To determine the values of a and bjs (parameter estimates). This can be tedious the
larger the number of predictors.
Normal equations
The following gives a way of obtaining the normal equations.
y = a + b1 X1 +b2 X2 + --------+bn Xn
Number of equations required = n + 1 since there are n + 1 unknowns (a, b1 b2 -------- bn)
For question number 1, we take ∑s thus:
1. ∑y = ∑a + ∑b1 X1 + ∑b2 X2 + ---------- +∑bn Xn
Considering the constants, we have:
∑y = na + b1∑X1 + b2 ∑X2 + ---------- +bn∑ Xn
For equation numbers 2, 3 and so on, we multiply by the Xjth independent variable respectively,
take summations and factor out the constants as follows::
2. X1 y = a X1 + b1 X12 + b2 X1 X2+ --------------- +bn X1Xn
∑X1 y = ∑aX1+ ∑b1 X12 + ∑b2 X1 X2 + --------- +∑bnX1 Xn
∑X1 y = a∑X1 + b1 ∑X12 + b2 ∑X1 X2 + --------- +bn∑X1 Xn
3. X2y = aX2 + b1 X1 X2 +b2 X22+ ------------- +bn X2Xn
∑X2y = ∑aX2 + ∑b1 X1 X2 +∑b2 X22+ ------------- +∑bn X2Xn
∑X2y = a∑X2 + b1∑x1 X2 +b2 ∑X22+ -------------- +bn ∑X2Xn
‘ ‘ ‘ ‘
‘ ‘ ‘ ‘
n+1. Xny = aXn + b1 X1Xn +b2 X2 Xn+ ---------- +bn Xn2
∑Xny = ∑aXn + ∑b1 X1Xn +∑b2 X2 Xn+ ---------- +∑bn Xn2
∑Xny = a∑Xn + b1 ∑X1Xn +b2 ∑X2 Xn+ ---------- +bn ∑Xn2
Once we have the appropriate ∑s, we substitute in these normal equations to solve the parameter estimates a, b1, b2, --------bn. These normal equations form the basis of the logic
for computer software to determine these estimates.
METHODS OF MEASURING FORECAST ERROR
These are methods which are used to compare two or more forecasting models. Generally, the smaller the value of the forecast error, the better the model irrespective of the method used..
Consider the following actual observations, y, the corresponding forecasts, [pic]and forecast error, [pic]
|No |[pic] |[pic][pic][pic] |[pic]-[pic] |
|1 |5 |6 |-1 |
|2 |4 |3 |1 |
|3 |10 |12 |-2 |
|4 |15 |21 |-6 |
|5 |23 |19 |___4___ |
| | | |[pic] |
Some of the methods of calculating forecasting error follow.
1. Mean error, [pic]= 1+1 + -2 + -6 + 4 =[pic]-
2. Mean Absolute Deviation (MAD) = [pic]----[pic] means ignore the -ve sign of [pic]
3. Sum of square Error (SSE) = [pic](-1)2+12+(-2)2(-6)2+ (4)2 = 58
4. Mean Square Error (MSE) = [pic]
5. Root Square Error (RSE) = RSE also [pic] = 3.41
Note: RSE = [pic]
Se = Standard Error of Estimate (see regression analysis)
6. Percentage Error (PE) = [pic]
PE = [pic] = [pic]37.6%
7. Mean Percentage Error, MPE = [pic]
8. Mean Absolute Percentage Error, MAPE = [pic]
MAPE = [pic]
24.5%
Each of these methods of measuring forecast error is best used in different situations.
They have their advantages and shortcomings whose discussion is beyond the scope of this text.
TIME SERIES ANALYSIS
Introduction
Many business and economic phenomena change or vary through time.
Time series data consist of observations through time usually at equal intervals e.g.
- Annually - weekly
- Hourly - monthly observations
Examples:
i) Monthly production level for a company over several years.
ii) Weekly sales for a chain of supermarkets over a couple of months, etc.
Statisticians have long recognised that time series data may consist of the following five components:
1. Secular Trend
2. Cyclical fluctuations
3. Seasonal variations.
4. Random variations
5. Erratic occurences
SECULAR TREND
This represents a steady increase or decrease generally over a long time inspite of variations in the short term or medium term e.g
i) Rising number of second-hand motor vehicles sold in Kenya.
ii) Decline in employment over the last 16 years.
Trends are attributed to such factors as population changes, technological progress, large scale shifts in consumer tastes and so on. Since causes of trend are known, it can be well captured mathematically.
y Trend
Cyclical fluctuation
Seasonal
variation
10 20 30 Time (years)
CYCLICAL FLUCTUATIONS
These occur above or below trend line in the middle term. In the economic /business world, they are characterized by business cycles as shown below.
Production
Peak trend line
downturn
upswing
through
(years)
A cycle contains four phases:
1) The upswing or expansion - during this phase the business acivity is accelerated. Unemployment is low and production is brisk (revovery)
2) The peak - the rate of economic ativity is at its best (boom)
3) The downturn or contraction - when unemployment rises and activity wanes (recession)
4) The trough - where activity is at its lowest point (depresssion)
There is no satisfactory explanation of business cycles and thus there is no mathematical techniques to model them yet.
SEASONAL VARIATION
These are changes that occur within the span of one year and tend to be repeated say weekly,
quarterly, monthly etc. depending on what is causing the time series variable e.g. sale of particular items
is at its peak in given seasons such as:
- umbrellas in the rainy season.
- Ice cream and cold drinks in hot season, etc
Causes:
- weather (rotation of earth around the sun) (this causes natural seasons such as summer and winter)
- customs e.g. religion - these cause artificial seasons such as x-mass, ID UL Fitr, Valentine day, etc.
Since the causes of seasonal variations are well known, this component can be well captured mathematically
RANDOM VARIATION
These represent a large number of small environmental influences that operate on a time series, some uplifiting, others depressing but none of which is significant enough to warrant isolation e.g. measurement error.
ERRATIC OCCURENCES
These are non-recurring influences which cannot be mathematically captured yet they can have profound consequences on a time series and they include phenomena such as floods, wars, strikes, unexpected breakthroughs on a peace process, etc.
No. of tourists
At time T, there is a sharp decline in the
number of tourists e.g. due to political
violence that time.
T Time
As much as erratic occurrences may have profound consequences on a time series, it is usually not possible to tell them in advance but where they occurred they can be traced from historical records.
Time series forecasting techniques to be studied
1. Moving averages (MA)
- Naïve method
- General MA (also called unweighted MA)
- Weighted MA
- Exponential smoothing
2. Classical decomposition methods
- Additive model
- Multiplicative model
NAIVE METHOD
Naïve forecast is [pic]t = yt - 1
Where [pic]t = current forecast (forecast for time t)
[pic] = actual previous observation
e.g. If sales for January was 99 units then forecasted sales for February = 99 units.
Advantage
Simple method.
Disadvantages
1) Completely ignores all past observations except the most recent
2) Does not recognize trend, cyclical and seasonal variations – no changes are anticipated
at all.
3) It is a single period forecast.
Note: Naïve method is moving average method with n=1
GENERAL MOVING AVERAGES (UNWEIGHTED MAs)
A moving average forecast takes a number of the latest (n) actual observations and obtain their arithmetic mean and this becomes the current forecast e.g. if we take the average sales of previous 3 months, then this is a 3 month moving average forecast. The general MA also called “unweighted” MA, gives equal weights to all the observations used.
Example
Consider the sales in units for a product for the last 6 months. Obtain a 2 month and 3 month
MA forecast.
Make calculations to the nearest whole number.
| | |Forecasts |
|Month |Actual sales (y) |2 month |3 month |
| | |MA |MA |
|April |73 |- |- |
|May |75 |- |- |
|June |77 |74 |- |
|July |72 |76 |75 |
|August |81 |75 |75 |
|September |82 |77 |77 |
|October |- |82 |78 |
The period, n for the MA selected will be that which minimizes forecast error.
Suppose we use MAD to measure forecast error.
For the 2 -months MA method,
MAD = [pic]--------- note that -ve sign of deviation is ignored.
For the 3 - month MA,
MAD = [pic]
Therefore, between the two methods, prefer the 2-month MA since it has the lower forecast error (as measured using MAD).
Characteristics of General MA’s
1. The greater the number of periods n in the MA the greater the smoothing effect.
2. As n increases trend is captured better at the expense of seasonal variation and vice versa.
3. If the underlying trend of the past data is thought to be fairly constant, then a greater number of periods should be chosen. Alternatively if there is thought to be some change in the underlying state of the data, then more responsiveness is needed and therefore fewer periods should be used in the MA. In any case, the best model is that which minimizes forecast error.
Advantages of General MA
1. Unlike naïve method, it considers more than one observation.
2. To some extent it captures both seasonal variation and trend.
Disadvantages
1. Gives equal weight to all observations used in the forecast.
e.g. if forecast for October = f (September, August, July) then each of the 3 months is given a weight of 1/3 for a 3-month MA..
It is reasonable to suppose that the most recent data is more relevant to current conditions.
Remedy: use weighted MA, e.g. exponential smoothing.
2. Does not make use of all available data i.e. uses only the most recent n observations and ignores all the others.
Remedy: Exponential smoothing.
3. Forecast is for only one time period (very short-term – true for all MA models)
Remedy: For long term forecasts in time series, use decomposition methods.
4. Does not explicitly address seasonality and trend.
Remedy: Decomposition methods and modified exponential smoothing.
5. General MA method can require large data storage especially for a multi-product firm, which can be costly. However, this no longer much of a problem given enhanced data storage facilities currently available.
Remedy: Exponential smoothing and decomposition methods.
WEIGHTED MA
This is a technique whose major objective is to overcome some limitations of the unweighted MA by assigning weights whose magnitude decrease the further back in time the observation belongs. The weights are chosen arbitrary although some guiding principles are:
i) they should add up to 1.
ii) They should minimize forecast error.
Example
For the preceeding data, suppose we want to determine a 3-month MA forecast of sales for October. Further, suppose we assign a weight of 0.5 to September value, 0.3 to August and hence 0.2 to the value of July, then the forecast for October will be:
Foct = 82 x 0.5 + 81 x 0.3 + 72 x 0.2 = 79.7 ~ = 80 units
Advantages
Recognizes that more recent observations require greater weight than the older ones. Other advantages are the same as those of the unweighted MA method.
Disadvantages
These are the same as for the unweigted MA except for the weighting aspect.
EXPONENTIAL SMOOTHING
Introduction.
This is a time series forecasting technique which overcomes some limitations of the general MA method.
Actually it is a special case of MA method. The method automatically weighs past data with weights that decrease exponentially with time. i.e. the most recent values receive the greatest weighting and the older observations receive increasingly decreasing weights. Also, theoretically, it uses all past data.
Exponential Smoothing: Model Derivation
Let [pic] be the smoothing constant
NB: 0 [pic] 1
[pic] [pic] forecast for time t
[pic] [pic] actual observation for time t (this is the historical value)
Forecast for time t + 1 is given as:
I: [pic] = [pic][pic]+ [pic](1-[pic]) [pic] + [pic] (1 - [pic])2 [pic] + --------------------
The weights decline exponentially with time.
Given that a forecast can be done at any point of time t, we have equation II for forecast at time t as follows:
II: [pic]= [pic][pic]+ [pic](1-[pic])[pic] + [pic](1-[pic])2[pic] + -----------
Multiply equation II by the common ratio, (1-[pic]) thus:
III: (1 - [pic]) [pic] = [pic](1-[pic])[pic] + [pic](1-[pic])2[pic] + [pic](1-[pic])3[pic] + --------------
To obtain equation IV, we have I - III thus:
IV: [pic] - (1-[pic])[pic] = [pic][pic]………..note that the like terms in the two equations cancel out.
Rearrange equation IV to obtain equation V thus:
V: [pic] = [pic][pic]+ (1-[pic])[pic]
From equation V, we find that:
Current forecast = [pic] x previous actual observation + (1-[pic]) x previous forecast.
Alternative formula:
If we remove brackets in equation V, we have equation VI below:
VI: [pic] = [pic][pic] + [pic] - [pic][pic]
Rearranging equation VI, we have equation VII next.
VII: [pic] = [pic] + [pic][pic] - [pic][pic]
Factorizing equation VII, we have equation VIII
VIII: [pic] = [pic] + [pic](yt - [pic])
NB: [pic] - [pic] = forecast error
Hence, from the foregoing equation, we find that:
Current forecast = previous forecast + [pic] x previous forecast error
A characteristic of exponential smoothing is that the weights add up to 1 as proved next.
Let the sum of all the weights be S
then I S = [pic]+[pic]( 1 - [pic]) + [pic] (1 - [pic])2 + -----------
We multiply equation I by the common ratio, 1- [pic] to obtain equation II as follows:
II = (1 - [pic]) x S = [pic](1 - [pic]) + [pic](1 - [pic])2 + ------------
To obtain equation III we subtract II from I thus:
III = I - II: S - (1 - [pic]) S =[pic]--------- (the rest of the terms cancel out since they are
the same)
Expand equation III thus:
S - S + [pic] S = [pic]
This results in: [pic] S = [pic]
so that S = [pic] = 1 ----------- proved
Example.
The sales of a product for the last 6 months follow. Obtain forecasts using exponential smoothing for
i) [pic][pic] = 0.2 ii) [pic] = 0.7 and hence determine the better model in terms of minimum SSE.
Let the forecast for May be the actual for April (starting point) and make calculations to the nearest whole number.
|Month |Actual observation |Exponential |Forecast error |
|t |yt |smoothing forecast | |
| | | [pic]= 0.2 |[pic]=0.7 |[pic]=0.2 |[pic]=0.7 |
| April |44 |- |- |- |- |
|May |45 |44 |44 |1 |1 |
|June |41 |44.2 |44.7 |10.24 |13.69 |
|July |46 |43.6 |42.1 |5.76 |15.21 |
|August |38 |44.1 |44.8 |37.21 |46.24 |
|September |40 |42.9 |40.0 |8.41 |_ 0_ |
| | | |SSE = |62.62 |76.14 |
Conclusion
Prefer [pic] = 0.2 since it results in lower SSE i.e. it gives more accurate forecasts.
NB: A characteristic of exponential smoothing method is that as [pic] 0, current forecast
Previous forecast and as [pic] 1 Current Actual previous observation
(naive forecast) forecast
Advantages of exponential smoothing
1. Gives greater weight to more recent data
2. All past data are used.
3. Requires very little data (actually only 3 i.e. yt, [pic]t and [pic]
4. Like the general MA method, it is an adaptive forecasting system i.e. it adapts continually as new data becomes available and so it is frequently incorporated as part of stock and production control systems.
5. Values of [pic] can be altered to reflect changing circumstances but generally, use [pic] which minimizes forecast error.
6. It is well grounded theoretically and yet simple to use and explain to the non-sophisticated
users.
Disadvantages
1. It does not directly cater for seasonality and trend. However, the basic model has been modified to cater for them as follows:
i) Double exponential smoothing (Holts model), caters for seasonality.
ii) Triple exponential smoothing (Holt-Winters model) caters for both seasonality and
trend.
2. Not suitable for long-term forecasting
Remedy: Use classical (traditional) decomposition methods or Holt-Winters model.
Classical Decomposition of a Time Series
Introductions:
Also called traditional decomposition method, this presumes that a time series observation is influenced by all the components of a time series. i.e. y = f (T, C, S & R) where
T = Secular trend
C = Cyclical fluctuation
S = Seasonal variation
R = Random variation
NB: Since erratic occurrences are largely unpredictable, this component is left out of
the model
There are generally two types of decomposition models.
1) Additive model: y = T + C + S + R------- assumes the components are
independent i.e. they do not influence one another.
2) Multiplicative model: y = T x C x S X R -------- assumes the components are dependent
i.e. that they influence one another
Which one is suitable is that which will minimize forecast error although a scatter diagram of the time series can sometimes reveal which is preferable.
1) 2)
y y
0 Time 0 Time
Seasonal factor not changing- use additive Changing (in this case increasing
model. seasonal factor) - use multiplicative
model.
Note that the use of classical decomposition only arises if it is thought that there is a seasonal component - hence there must be at least 2 observations in the year for it to be used.
Example
The sales of a product for the last 3 years recorded on a quarterly basis are as follows {columns (1) to (4)}.
Required
a) Obtain a graph of t vs Y. What do you observe?
b) Determine both an additive and multiplicative forecasting models and hence make forecasts for the next year. Which model is preferred? Use MSE to compare them
Additive Time Series model
The ultimate goal of the additive model is to obtain a forecasting model thus:
Ft = [pic]+ Si where Ft = forecast for time t,
T = Trend
S = Seasonal component for season i
Steps for the Additive Model
1. Obtain an n-period Moving Average (MA) where n = number of observations in the year.
In this example, this will be a 4-quarter MA since there are four observations in the year
{column (5)}
The objective of the MA is to obtain components T + C although we ignore C since it is difficult to isolate. Thus, MA becomes only T i.e. Trend. This is because MA takes into account all seasons in a year, so seasonality i.e. S+R is eliminated through smoothing.
2. If n is even (like in this case), obtain the centred MA as the arithmetic mean of two adjacent uncentred MAs {column (6)}. If n is odd, centering is automatic.
3. Isolate S + R as Y - MA, which is T + C + S + R - (T + C) which yields S + R
Workings for Additive Model
|(1) |(2) |(3) |(4) |(5) |(6) |(7) |(8) |
|Period |Year |Qtr |Sales (Units) |Uncentred |Centred |S+R = |Seasonal component S |
|t | | |Y = T = T+C+S+R |MA=T+C |MA=T+C |Y - MA | |
|1 |I |Q1 |72 | | | |-47.985 |
| | | | | | | | |
|2 | |Q2 |110 | | | |-11.985 |
| | | | |117.75 | | | |
|3 | |Q3 |117 | |118.25 |-1.25 |0.705 |
| | | | |118.75 | | | |
|4 | |Q4 |172 | |119.00 |53.00 |59.265 |
| | | | |119.25 | | | |
|5 |II |Q1 |76 | |120.88 |-44.88 |-47.985 |
| | | | |122.50 | | | |
|6 | |Q2 |112 | |125.25 |-13.25 |-11.985 |
| | | | |128.00 | | | |
|7 | |Q3 |130 | |128.25 |1.75 |0.705 |
| | | | |128.50 | | | |
|8 | |Q4 |194 | |129.38 |64.62 |59.265 |
| | | | |130.25 | | | |
|9 |III |Q1 |78 | |130.00 |-52.00 |-47.985 |
| | | | |129.75 | | | |
|10 | |Q2 |119 | |130.63 |-11.63 |-11.985 |
| | | | |131.50 | | | |
|11 | |Q3 |128 | | | |0.705 |
| | | | | | | | |
|12 | |Q4 |201 | | | |59.265 |
4. Obtain S as arithmetic mean of values of S + R from the same season. This is conveniently
done in the following table. These are values of column (7) which are rearranged on the basis
of season (quarter) and the year so that those from the same season are aligned.
Workings for Seasonal Components
| | Year |
|Quarter |I |II |III |Unadjusted S |Adjusted S |
|Q1 |- |-44.88 |-52.00 |-48.44 |-47.985 |
|Q2 |- |-13.25 |-11.63 |-12.44 |-11.985 |
|Q3 |-1.25 |1.75 |- |0.25 |0.705 |
|Q4 |53.00 |64.62 |- |58.81 |59.265 |
| | | | |[pic] |[pic] |
Mean of unadjusted S, [pic]
Notes:
i) Since R is random, we can assume that it is what causes differences in the seasonal factor for a particular season and hence obtaining the arithmetic mean of S + R for a particular season eliminates R.
ii) The sum of unadjusted S is -1.82 and the mean is -0.455. Theoretically, the sum and mean of S for additive model should be zero, that is, on the trend.
To achieve this, we adjust S values using the -ve value of the mean i.e. -(-0.455) which is +0.455. So, we add 0.455 to each unadjusted value of S to get the adjusted values. These, then, become the respective seasonal indexes.
iii) Those seasons with -ve values of S (Q1 and Q2) are low seasons since they are below trend. Conversely, Q3 and Q4, having +ve values of S, represent high seasons; they are above trend.
5. Now that we have S, we need to obtain the trend. Assume a linear trend. Then we have:
T = a + bt where t = period (we let the first observation have t = 1)
T = seasonally - adjusted (also called deseasonalized) values.
For T, we use seasonally adjusted values since it is devoid of S but has the other components shown as follows:.
Seasonally adjusted value, T* = Y - S
T* = T + C + S + R - S
which results in T* = T + C + R
The workings are to the nearest whole number in the next table. We use ordinary least
squares (OLS) method to obtain the line of best fit using the following normal equations:
[pic]
[pic]
Table of OLS Trend Workings for Additive Model
|Year | Quarter |Period t |Y |S |T= Y-S |Tt |t2 |
|I |Q1 |1 |72 |-47.985 |120 |120 |1 |
| |Q2 |2 |110 |-11.985 |122 |244 |4 |
| |Q3 |3 |117 |0.705 |116 |348 |9 |
| |Q4 |4 |172 |59.265 |113 |452 |16 |
|II |Q1 |5 |76 |-47.985 |124 |620 |25 |
| |Q2 |6 |112 |-11.985 |124 |744 |36 |
| |Q3 |7 |130 |0.705 |129 |903 |49 |
| |Q4 |8 |194 |59.265 |135 |1080 |64 |
|III |Q1 |9 |78 |-47.985 |126 |1134 |81 |
| |Q2 |10 |119 |-11.985 |131 |1310 |100 |
| |Q3 |11 |128 |0.705 |127 |1397 |121 |
| |Q4 |12 |201 |59.265 |142 |1704 |144 |
| | |78 | | |1509 |10056 |650 |
| | |[pic] | | |[pic] |[pic] |[pic] |
We apply the normal equations to obtain the following simultaneous equations:
12a + 78b = 1509
78a + 650b = 10056
We can write as a matrix equations thus:
[pic]
A X = B
Solution using matrix inverse method.
If we consider this matrix equation as:
A x X = B, then
[pic] [pic][pic]
Thus, we need to obtain A-1.
det. A = 12 x 650 - 782 = 1716
Next, inverse of matrix A is:
[pic]
so that the unknown, X is:
X = [pic] = [pic] [pic]
[pic] = [pic]
Thus a = 114.5
b = 1.73
and the trend equation is:
[pic]= 114.5 + 1.73t
Hence, the additive prediction model is
Ft = [pic]+ Si where i = season or quarter (i = 1,2,3,4)
Ft = (114.5 + 1.73t) + Si
YEAR IV Forecasts - Additive Model
|Quarter |t |Si |Ft = 114.5 + 1.73t + Si |
| Q1 |13 |S1 = -47.985 |89 |
| Q 2 |14 |S2 = -11.985 |127 |
| Q 3 |15 |S3 = 0.705 | 141 |
| Q 4 |16 |S4 = 59.265 |201 |
Exercise
Determine the forecasts for all quarters of year V using the preceeding additive model.
MULTIPLICATIVE MODEL
Assumption: Y = T x C x S x R
Objective: To obtain the prediction model Ft = [pic]x Si
The steps followed are identical to those of additive model except where there was addition, we use multiplication and where we had subtraction, we use division.
Steps in Multiplicative Model
1. Obtain the MAs and centre them if need be. MA for multiplicative model is T x C since
S & R are eliminated (or “smoothed” out). These are done in columns (5) and (6).
|(1) |(2) |(3) |(4) |(5) |
| |I |II |III |Unadjusted S |Adjustment |Adjusted S |
|Qtr 1 |- |0.63 |0.60 | 0.615 | | 0.617 |
| 2 |- |0.89 |0.91 |0.900 |[pic] |0.902 |
| 3 |0.99 |1.01 |- | 1.000 | |1.003 |
| 4 |1.45 |1.50 |- |1.475 | |1.478 |
| | | | |[pic][pic] | |[pic] |
| | | | |[pic] | |[pic] |
Mean of adjusted, S, [pic]
Adjustment is done by dividing unadjusted S by the mean, [pic]
Note that seasons 1 and 2 are low seasons (S is less than 1) while seasons 3 and 4 are high seasons(S greater than 1). This is consistent with the additive model already studied.
4. Calculations for Trend
Assuming linear trend, then T = a + bt
Where t = (period let first observation t = 1)
T = deseasonalised or seasonally adjusted data
[pic]
We obtain a and b using OLS procedures (normal equations) thus
12a + 78b = 1502
78a + 650b = 9987
Matrix format
[pic]
Use of Cramers Rule
a = [pic] [pic] [pic]= [pic] a = 114.98
[pic] = [pic] = 1.57
1716
Hence, the trend equation is:
[pic]= 114.98 + 1.57t
The multiplicative model forecast is:
Ft = [pic] x Si where i = season or quarter (i = 1,2,3,4)
Ft = (114.98 + 1.57t)Si
Year IV Forecasts - Multiplicative Model
|Quarter |t |Si |Ft= (114.98 + 1.57t) x Si |
|Q 1 |13 |S1 = 0.617 |84 |
|Q 2 |14 |S2 = 0.902 |124 |
|Q 3 |15 |S2 = 1.003 |139 |
|Q 4 |16 |S3 = 1.478 |207 |
Exercise
Determine the forecasts for all the quarters of year V using the multiplicative model.
Comparison of Accuracy of the Additive & Multiplicative Models: use of MSE
|t |Actual |Additive Model |[pic] |Multiplicative Model Fitted value |[pic] |
| |Obs of y |Fitted value | | | |
|1 |72 |68 |16 |72 |0 |
|2 |110 |106 |16 |107 |9 |
|3 |117 |120 |9 |120 |9 |
|4 |172 |181 |81 |179 |49 |
|5 |76 |75 |1 |76 |0 |
|6 |112 |113 |1 |112 |0 |
|7 |130 |127 |9 |126 |16 |
|8 |194 |188 |36 |189 |25 |
|9 |78 |82 |16 |80 |4 |
|10 |119 |120 |1 |118 |1 |
|11 |128 |134 |36 |133 |25 |
|12 |201 |195 |36 |198 |9 |
| | | |SSE =258 | |SSE = 147 |
MSE (additive model) = [pic]
MSE (multiplicative model) = [pic]
Conclusion
Prefer multiplicative model since it gives a better fit in terms of having the lower value of forecast error
A comparative summary of some features of the two models
| |Sum of S |Average of S (trend) |Forecasting model |
|Additive model |0 |0 |Ft = [pic] + S |
|Multiplicative model |n |1 |Ft = [pic] x S |
n= number of seasons in the year
Advantages of decomposition methods .
1) Explicitly considers all components of a time series, especially seasonal and trend components.
2) It is suitable for long-term forecasting unlike the MA methods.
3) Once the model is obtained, we require no other data other than the equation and the seasonal indexes.
Shortcomings.
1) Rather tedious to develop.
2) It does not use all past data unlike exponential smoothing.
-----------------------
Qualitative methods
Quantitative methods
Time series methods
Decomposition methods
Market research
Delphi method
Causal methods
Historical Analogy
Econometrics
Moving averages + exponential smoothing Analogy
Autoregressive methods
Regression
analysis
FORECASTING TECHNIQUES
Graphical methods
Visual fit method
Hi-Low method
Simple
Multiple
Recall: Derivative of a function of a function
1) If y = (15 – 2[pic])10
then [pic] = 10 (15 - 2[pic])9 x -2
= -20 (15 - 2[pic])9
2) y = 15 (21[pic]2 - 3[pic])1.9
28.5 (21[pic]2 - 3[pic])0.9 (42[pic] - 3)
£ y = [pic]+ b£[pic]
£[pic]y = a £[pic] + b £[pic]2 - 3[pic])0.9 (42[pic] - 3)
Σ y = [pic]+ bΣ[pic]
Σ[pic]y = a Σ[pic] + b Σ[pic]2
r2 = for simple regression
R2 = multiple regression
t [pic] small sample
z [pic] large sample
[pic]
CI = b ± t or z × Sb
Changing variance heteroscedastic
NB: Shortterm: < 1 year
Medium term: > 1year < 15 years
Long term > 15 years
e.g. for: [pic]
If [pic] = 0.4
then [pic](1-[pic]) = 0.4 x 0.6 = 0.24
[pic](1-[pic])2 = 0.4 x 0.62 = 0.144 and so on
* Collinearity [pic]2 predictors * Multicollinearity [pic]3 or more predictors
Note the distinction between T* and T in these equations
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- nemis education go ke log in
- fashion forecasting 2019
- nemis education go ke portal
- nemis education go ke institution
- weather research and forecasting model
- tsc go ke online service
- advantages of demand forecasting techniques
- demand forecasting methods
- benefits of forecasting in business
- financial forecasting and planning
- financial planning and forecasting pdf
- fashion forecasting report 2019