Introduction - Pearson Education



Instructor’s Manual

Quantitative Methods for Business and Economics

2nd edition

Modular texts in business and economics

Glyn Burton

George Carrol

and

Stuart Wall

[pic]

ISBN 0 273 65524 8

( Pearson Education 2002

Lecturers adopting the main text are permitted to photocopy the pack as required.

[pic]

Pearson Education Limited

Edinburgh Gate

Harlow

Essex CM20 2JE

England

and Associated Companies around the world

Visit us on the World Wide Web at:



____________________________

First published 2002

© Pearson Education Limited 2002

The rights of Glyn Burton, George Carrol and Stuart Wall

to be identified as authors of this Work have been asserted by them

in accordance with the Copyright, Designs and Patents Act 1988.

ISBN 0 273 65524 8

All rights reserved. Permission is hereby given for the material in this

publication to be reproduced for OHP transparencies and student handouts,

without express permission of the Publishers, for educational purposes only.

In all other cases, no part of this publication may be reproduced, stored in

a retrieval system, or transmitted in any form or by any means, electronic,

mechanical, photocopying, recording, or otherwise without either the prior

written permission of the Publishers or a licence permitting restricted copying

in the United Kingdom issued by the Copyright Licensing Agency Ltd,

90 Tottenham Court Road, London W1P 0LP.

Contents

A variety of spreadsheet applications and exercises using Excel are presented for the following topic areas (not all chapter headings are covered). Many of these are integrated with materials in the existing textbook, others are freestanding. A separate data file contains the raw data to be used in many of the spreadsheet exercises. The materials conclude with two computerised practice tests.

1 Introduction: Excel

2 Central location and dispersion

3 Regression and correlation

4 Probability distributions

5 Sampling and tests of hypotheses 1

6 Sampling and tests of hypotheses 2

7 Index numbers

8 Time value of money

9 Basic mathematics

10 Exam Practice

Test 1

Test 2

Data files

Introducing Excel

By the end of this chapter you should be able to:

❑ understand the Excel screen;

❑ enter data;

❑ edit data;

❑ save your workbook;

❑ work with ranges of cells;

❑ open a saved workbook;

❑ create graphs and charts.

1.1 The use of spreadsheets

For many large collections of real data, computer spreadsheets are used to organise and analyse information. Although the spreadsheet was designed for the calculation and organisation of financial data, it is an extremely powerful tool for most forms of business and economic data.

The value of the spreadsheet is that once the data is entered it is a simple and quick process to analyse and manipulate it in a variety of sophisticated ways. This means that the analysis (and the arithmetic) is automatic. In order to do this, of course, you need a level of computer literacy and you need to learn the basics of spreadsheets. Microsoft’s Excel is the largest-selling spreadsheet package in the business world. Consequently, we will use the package to support the textbook and to provide you with a very marketable skill when you graduate.

1.2 To start work in Excel

Once you have logged into the network you will find the Excel icon:

Move the mouse over the icon and select it (double-click on it with the left mouse button).

The Excel application should load, presenting you with the screen as in Figure 1.1.

[pic]

figure 1.1 The Excel window

1.3 Understanding the worksheet screen

Overview

A spreadsheet or worksheet is the equivalent of a large sheet of paper, divided into columns and rows. A workbook consists of one or more worksheets – each worksheet can be regarded as a ‘page’ in the book.

When you start Excel it opens a workbook window containing a group of related worksheets named, initially, Sheet1, Sheet2 etc. An examination of the screen shows that Excel consists of two windows:

• The application window: found at the top of the screen, it contains all the Excel commands – menus, tool bars etc.

• The document window: the main part of the screen, which consists of the worksheet itself – divided into columns and rows, plus other features such as scroll bars and sheet numbers.

The application window

The menu bar

[pic] [pic]

Provides a list of menu options. The Excel commands are grouped under these menus

The tool bars

There are two tool bars immediately below the menu bar; each consists of a row of buttons that you ‘click’ to carry out Excel tasks. Often a button is a shortcut alternative to a menu command; occasionally there is no menu alternative to using a button.

The standard toolbar

[pic]

Contains the buttons used for formatting, editing, file handling and printing.

The formatting toolbar

[pic]

Contains buttons used for controlling the appearance of the contents of each cell or range of cells. This includes, for example, fonts, borders, shading, alignment and numeric formats.

The formula bar

[pic]

Shows whatever is in entered in the active cell. This is a new workbook so all the cells are blank.

The Reference Area – shows the row and column number of the active cell.

The title bar

[pic] [pic]

Whenever a new workbook is opened Excel gives it a temporary or default name, Book1, Book2 etc. You should change the name when you save the workbook.

Maximise/restore buttons

At the top right of the screen are two groups of three buttons; these control the size of the worksheet screen. One set is for the whole application window and one for the inner cell area, the document window.

The minimise button [pic] Reduces the size of the screen to a small icon.

The maximise button [pic] Increases the window to full screen size.

The close button [pic] Closes the workbook.

The restore button [pic] Restores the window to its original size.

The document window

A worksheet window can only show a small fraction (see Figure 1.2) of the total worksheet size; potentially each sheet in a workbook can be 256 columns across and 16,384 rows down!

[pic]

figure 1.2 The document window

The scroll bars

[pic]

Horizontal and vertical bars enable you to move around a worksheet.

A scroll box in the bar shows the current display’s position relative to the entire workbook contained in that window.

In the workbook illustrated in Figure 1.1, the scroll boxes show that we are positioned at the top left portion of the worksheet (as in Figure 1.2). The length of the scroll boxes indicates the amount of the worksheet in use.

Cell references

[pic]

Column headings contain the column references.

Row headings contain the row references.

Spreadsheets use cell references, based on column and row headings, to:

❑ identify cells or groups of cells on a worksheet – a group of cells is known as a cell range;

❑ point to cells containing the value to be used in a formula.

The reference of this cell is D4

The reference of this range is B7:E11

The active cell

A heavy border surrounding

the cell indicates the active cell.

Sheet names

[pic]

A new workbook consists of a number of blank worksheets, having the default names Sheet1, Sheet2 etc. Collectively these form a workbook. Each sheet is marked with a name tab – the name highlighted indicates which sheet is currently selected or ‘active’. In the example above, Sheet1 is currently selected. To open a new sheet click on the name tab.

On the left of the sheet names are a number of arrow buttons to move quickly through a group of worksheets.

The status bar

[pic]

Is positioned below the sheet names and displays information about the current command; no command has been issued yet so it reads ‘Ready’. When a cell is made active it will read ‘Enter’.

1.4 Entering data

At the data entry stage, you should ignore the appearance of the worksheet and concentrate on entering your data accurately. Therefore, avoid all the intricacies of formatting.

Types of data

Before data can be converted into information, you need to enter the raw data into the cells that make up the worksheet. There are many types of data that you can enter, including:

1. text;

2. numbers;

3. dates;

4. times;

5. formulas and functions.

Entering text

You can enter any combination of letters and numbers as text. Text is automatically left-aligned in a cell.

To enter text into a cell:

1. Select the cell into which you wish to enter text.

2. Type the text. As you type, your text appears in the cell and in the formula bar, as shown in Figure 3.

3. Click the ACCEPT button [pic] or use the key marked ↵ on the keyboard.

Whatever you type also appears on the formula bar.

figure 3 The formula bar

If you make a mistake and wish to cancel an entry before it is complete, press ESC or click on the Clear button [pic] on the formula bar.

Number text

You may want to enter a number as text (for example, a telephone number). Precede your entry with a single quotation mark (‘), as in ‘946220. The single quotation mark is an alignment prefix that tells Excel to treat the following characters as text and left-align them in the cell.

Entering numbers

Numbers include all the numeric characters zero to nine (0–9) as well as the mathematical operators and, on some occasions, ( ) , $ %. Numbers are automatically right-aligned.

To enter a number:

1. Select the cell into which the number is to be entered.

2. Type the number. To enter a negative number, precede it with a minus sign, or surround it with parentheses.

3. However, it is not necessary to use a comma to separate thousands.

4. Click on the Enter button [pic] on the formula bar, or use the Enter key (↵) on the keyboard.

It is better to format a cell to a currency format rather than type a column of pound sterling amounts including the pound signs and decimal points. For example, you can type numbers such as 241 and 25.17, and then format the column with currency formatting. Excel would then change your entries to £241.00 and £25.17, respectively.

There is, therefore, no need to enter:

• a pound sign (£), or

• the dollar sign ($) before the number.

Where possible, calculate percentages rather than use the % icon.

[pic]

If you enter a number, and it appears in the cell as hatch signs, #######, it means that the cell is not wide enough to display the number. To remedy this, position the cursor on the extreme right boundary of the column heading – when you are in the correct position the cursor changes to a double-pointed arrow [pic] – then adjust the width using the mouse.

An important point to consider is that the accuracy of your results depends upon the accuracy of your data. You should, therefore, always enter the data as accurately as possible. This is true not only when using a spreadsheet but whenever you use quantitative methods in the decision-making process.

Central location and dispersion

By the end of this chapter you should be able to:

❑ construct a frequency table from data collected in a group format;

❑ calculate the descriptive statistics of grouped data:

measures of central tendency;

measure of dispersion.

2.1 Analysing grouped data

On many occasions, data are grouped into a frequency table with various class intervals. To deal with such data the simplifying assumption must be made that within any given class interval the items of data fall on the class mid-point. This is equivalent to assuming that the items of data are evenly spread within any given class interval.

Exercise 2.1

The descriptive statistics for Worked Example 2.2 on page 32 of your textbook show the results from a survey of the prices of 60 items sold in a shop (Table 2.1).

Table 2.1 The frequency of items within a given price range

| Price of item (£) |Number of items sold |

|1.5–2.5 |15 |

|2.5–3.5 | 2 |

|3.5–4.5 |19 |

|4.5–5.5 |10 |

|5.5–6.5 |14 |

1. Construct a table of 7 columns.

a) Label the 7 columns in exactly the same cells as below:

b) In the column headed LCB enter the values for the lower class price boundary.

c) In the column headed UCB enter the values for the upper class price.

d) In the column headed Fi enter the data for the number of items sold.

e) Calculate the class mid-point (Xi) – sum the lower and upper class boundaries then divide by 2 which is equivalent to the average of the upper and lower class boundaries.

In cell E4: =(B4+C4)/2

f) Calculate F iX i – (e.g. multiply the first frequency value (F1) by the first class mid-point (X1).

In cell F4: =D4*E4

g) Calculate F iX i2 – remember that only the X value is squared, so multiply the first class mid-point (F1) by the first value of (F1 X1).

In cell G4: =E4*F4

h) Highlight cells E4:G4.

i) Calculate the remaining values for: X2…X8; F2X2…F8X8; F2X22…F8X82.

Copy the formulae to the bottom row of your table (row 8).

1. Cumulative frequency.

a) Start the cumulative frequency

In cell H4: =D4

b) Add each new frequency to the previous frequencies.

In cell H5: =H4+D5

Copy the formula in cell H5 to the bottom row of your data.

2. Determine the sums.

You should recall from the textbook that we have constructed the table to find the sum of three columns (ΣF i), (ΣFiXi) and (ΣFiXi2).

a) Calculate the sum of the frequency data (ΣF i).

i) Make cell D9 the active cell.

ii) Click on the AutoSum icon on the Standard Toolbar [pic]

iii) A shimmering box will appear around the cells that Excel believes to be the correct range and the range reference is entered automatically into the cell.

iv) If the range is correct, press Enter. If the range is incorrect, amend it, either with the mouse or type the correct range.

b) Calculate the sum of ΣFiX i and ΣFiXi2.

i) Make cell F9 and G9 active.

ii) Click on the AutoSum icon.

Your results should appear as:

I have used the Border icon [pic] on the formatting toolbar to improve the readability of the spreadsheet.

2.2 Calculating the descriptive statistics of grouped data

The arithmetic mean

[pic]

From our table we know that ΣFiX i is contained in cell F9 and that ΣF i is contained in cell D9. Therefore, the value for the mean is obtained by simply dividing cell F9 by D9.

[pic]

In Excel the above formula equates to: =F9/D9

Thus, in an empty cell (I have used C12) calculate the mean price.

You should find that the mean price is £4.10.

The variance

[pic]

Once again we have all the above values – ΣFiX i2 is in cell G9 and ΣF i is in cell D9. The expression to the right of the minus sign is the same as the mean squared.

[pic]

Using our table (B2:H8) in Excel this formula becomes:

In cell C13: =G9/D9-C12^2

The result is 2.123333333 giving a variance of 2.12 square £s. In the business context, this calculation has little use and is rarely interpreted. However, it does form the basis of the standard deviation.

The standard deviation

The standard deviation is simply the square root of the variance:

[pic]

We can use the result of the variance calculation to obtain the standard deviation:

In cell C14: =SQRT(C13)

OR

In cell C14: =C13^0.5

resulting in 1.4571662.

Subscript notation

Many students of quantitative methods find the subscript notation confusing so, using the current example – the table of prices, let us examine the concept.

We have five classes or groups, so j = 5. If we number each class 1 to 5 then:

when j =1: the class is £1.5 to 2.5;

when j =2: the class is £2.5–3.5;

when j =3: the class is £3.5–4.5;

when j =4: the class is £4.5–5.5;

when j =5: the class is £5.5–6.5.

Thus, when we consider the lower class boundary (LCB) of the fourth class we can express it in mathematical shorthand as LCB4. Additionally, when we have the notation j–1 then if j=4, j –1 must be 3, or the third class.

The median and the quartiles

For grouped data, the formula for calculating the median is:

[pic]

Often, we will also need to calculate the upper and lower quartiles and, since the median is also the second quartile, the formula can be changed to:

[pic]

The number of observations to the quartile position is found by subtracting the previous cumulative frequency (CF):

Quartile position – previous CF

To express this mathematically:

[pic]

where: j = the number of the class interval.

As with ungrouped data, the first step in calculating the quartiles (the median being the second quartile or Q2) is to find the quartile positions using the following formulae:

[pic]

We have found that it is easier to use decimal values rather than fractions in Excel, so we can find the quartile position as follows:

[pic]

1. Find the quartile positions.

a) Construct a table in cells E11:G16:

b) Calculate the quartile positions:

In cell F12 =0.5*($D$9+1)

Multiplying ΣFi + 1 (the contents of cell D9 plus 1, i.e. 60 + 1) by 0.5 gives the median position. Since D9 is an absolute reference, when the formula is copied or moved, the contents of D9 (ΣFi) will always be used. Thus:

Copy the formula down to cell F14.

In cell F13 amend 0.5 to 0.75

In cell F14 amend 0.5 to 0.25

You should have the following values: the median position = 30.5;

Q3 position = 45.75;

Q1 position = 15.25.

2. Calculate the quartile values:

a) Identify the class containing the quartile position.

To identify the class containing the quartile position we will use our cumulative frequencies – previously calculated in column H.

We see from column H that the total number of observations up to the second class (i.e. below 3.5) was 17. The total number of observations up to the third class (i.e. below 4.5) was 36. As the median position is 30.5 it is higher than 17 but less than 36 and it must, therefore, occur in the third class (3.5 to 4.5) or where j = 3.

b) Calculate the median value where j=3.

[pic]

From your spreadsheet you will find that when j = 3:

Cell B6 contains: LCB3 = 3.5:

Class width3 = 1: (UCB3 – LCB3 = 4.5 – 3.5 =1)

Cell F12 contains: Quartile position = 30.5

Cell H5 contains: CF3–1 = 17

Cell D6 contains: F3 = 19

Substituting these values gives:

[pic]

Since we are calculating the median value in cell G12 and considering mathematical precedence, enter:

In cell G12: =B6+1*(F12-H5)/D6

c) Calculate the upper quartile (Q3).

i) Identify the class containing Q3 by using step (a) above.

ii) Calculate the value of Q3 by using step (b) above.

d) Calculate the lower quartile (Q1).

i) Identify the class containing Q1 by using step (a) above.

ii) Calculate the value of Q1 by using step (b) above.

The interquartile range and quartile deviation

This measure is found in the same way as ungrouped data, i.e.

Interquartile range = Q3 – Q1

Since cell G13 contains the value pertaining to Q3 and cell G14 contains the lower quartile value, the interquartile range is found:

In cell G15: =G13-G14

and the quartile deviation, which is half the interquartile range, by:

In cell G16: =G15/2

Your final table should now resemble:

As the values are in pounds sterling, format rows 12 to 16 to 2 decimal places.

All the statistics for grouped data can now be interpreted as for ungrouped data.

Exercise 2.2 Consumer profile

For a company to devise an appropriate strategy, it needs to understand its customers.

The strategy of a fast food company is based on offering a higher quality of product than the opposition but charging higher prices. Consequently, data on income, spending and age are important factors if the company wishes to pursue this strategy. To this end, a questionnaire is designed which is to produce a profile of the average consumer to ascertain:

• how much they spend on the average fast food meal;

• their income bracket;

• the age of the typical customer.

The questionnaire addressed the first point by asking:

Approximately how much do you spend on an average fast food meal?

|1 |Less than £4 |

|2 |£4–£5 |

|3 |£5–£6 |

|4 |£6–£7 |

|5 |£7–£8 |

|6 |More than £8 |

The questionnaire addressed the second point by asking:

What is your approximate family income before tax?

|1 |under £10,000 |

|2 |£10,000 to £14,999 |

|3 |£15,000 to £19,999 |

|4 | £20,000 to £24,999 |

|5 | £25,000 to £29,999 |

|6 | £30,000 to £34,999 |

|7 | £35,000 to £39,999 |

|8 | £40,000 and over |

The questionnaire addressed the third point by asking:

What age are you?

|15–19 |20–24 |25–29 |30–34 |35–39 |40–49 |50–59 |60–69 |70 or over |

|1 |2 |3 |4 |5 |6 |7 |8 |9 |

The respondent indicates their expenditure range by circling the appropriate number; for example, if a customer spends an average of £6.50 they mark the number 4. On examination of the data, you will find that the numbers 1 to 6 indicate the number of each class and therefore equate to the subscript j. Thus, j = 1 to 6.

The answers from the all the completed questionnaires are then entered into a spreadsheet for you to analyse. The raw data is saved under the file name Profile.

1. Load the file Profile from the network.

2. Use row 2 for your labels, starting at G2.

3. Enter the values of j in cells G3:G8.

4. Enter your lower and upper class boundaries in columns H and I.

5. Construct frequency distributions for expenditure using the COUNTIF function.

a) In cell J3: click on the Function Wizard and select

the COUNTIF function.

b) The second step dialogue box appears:

In the range box: specify where the raw data for expenditure is stored (column A).

In the criteria box specify what you wish to count.

For this exercise we wish to count the number of occurrences of 1, then 2 and up to 6; or each occurrence of j, and since j=1 is stored in cell G3, enter G3.

c) Click on the [pic] icon: You should find that only four people spent less than £4.

d) Copy the function to include all the classes.

6. Calculate the measures of central location and dispersion of the distribution from first principles (i.e. do not use the Function Wizard).

7. Calculate the measures of central location and dispersion of the income data.

8. Calculate the measures of central location and dispersion of the age data.

What are your conclusions from this analysis?

Exercise 2. 3

Load the file basedata. For this exercise we will use the data for the revenues of firms A and B.

Comparing two values

The range

The range is the difference between the maximum and minimum value,

i.e. range = maximum – minimum.

In the construction of a frequency table, we used functions to calculate the range. There, one cell was used to find the maximum value, another to find the minimum and a third to calculate the difference between the two. This is a common technique because the two statistics are routinely calculated as part of an overall analysis of a data set.

To extend our expertise with Excel, we will use only one cell to calculate the range and examine a technique often called point and click. Point and click provides a quicker and more efficient method of entering references into your formulae and functions by using the mouse to point to the relevant cell or range then clicking with the left mouse button. The reference is entered into the formula.

Note: After you have entered the opening bracket of your function, you can point to the column heading relevant to your data (in this case column A), then click with the left mouse button. The range A:A is entered as your argument.

In an empty cell (perhaps G3) enter =MAX(A:A)-MIN(A:A)

It should be no surprise to see that the range has the value £233.09.

The range is often used in conjunction with the arithmetic mean and mode and has the advantage of being simple to calculate. It is frequently used in the stock market to show the variability of share prices over a given period. For example, if two companies have a similar average share price but the range statistic of one company is greater, then the larger range indicates greater volatility in share price, i.e. it is less stable.

By using the range A:A, instead of A2:A107, flexibility is added to your analysis. This technique allows you to update (or amend) errors in your raw data without having to change the arguments in your formula. For example, change the value in any cell in column A to 1500 and you will see that the range changes automatically to 1067.46.

Restore your data set to its original value by clicking on the undo icon [pic] – it reverts to £233.09.

From this demonstration you should now observe a major limitation of using the range as a measure of spread – the statistic is easily distorted by one value. Generally, it is advisable not to use the range as the only measure of spread, but can be informative if used as a supplement to other measures of spread such as the standard deviation or semi-interquartile range.

Remember that labels improve the clarity of your work.

The interquartile range

The interquartile range is the absolute difference between the upper quartile (Q3 … the three-quarter point) and the lower quartile (Q1 … quarter point). Expressed symbolically we have:

Interquartile range = Q3 – Q1

The calculation of the interquartile range uses a similar approach to that of the median, except that the data array is divided into four equal portions or quartiles instead of two. The values forming the lowest 25% and the highest 25% of the array are ignored – leaving only the central 50%.

Once the data set was sorted into an array, the median divided the array into two equal portions. Thus, exactly half the data values were less than the median value (547.91) and exactly half were greater. Figure 2.1 shows that the interquartile range contains only the middle 50% of the array.

[pic]

figure 2.1 The relationship of the quartiles

We will examine two procedures to calculate the interquartile range – both similar to the methods used to find the range:

Method 1 – calculating the quartiles

1. In cell G4 find the upper quartile (Q3) =QUARTILE(A:A,3)

2. In cell G5 find the lower quartile (Q1) =QUARTILE(A:A,1)

3. In cell G6 find the interquartile range (Q3 – Q1) =G4-G5

Method 2 – combining functions

In cell G7 enter: =QUARTILE(A:A,3)-QUARTILE(A:A,1)

Either method should provide the same result of 75.97.

As the interquartile range is only slightly affected by extreme scores, it provides a good measure of spread for skewed distributions but is seldom used for data that are approximately normally distributed.

For a full description and syntax of the QUARTILE function use the help facility; then add its contents to your library of functions.

Semi-interquartile range

The semi-interquartile range, or quartile deviation, is simply half the interquartile range and has similar advantages and disadvantages as the interquartile range. The formula for semi-interquartile range is therefore:

[pic]

In Excel this is calculated as: =(QUARTILE(A:A,3)- QUARTILE(A:A,1))/2

You will note that an extra set of brackets have been used to ensure that the numerator is calculated before the denominator.

Now calculate the quartile deviation in cell G8.

Continuity correction

Excel’s QUARTILE function ignores continuity correction so, strictly speaking, we should make slight adjustments when finding the quartile values of a distribution (see page 13 in your textbook). However, for most business purposes the value returned by the function is adequate. Figure 2.2 illustrates the above calculations.

[pic]

figure 2.2: Measuring the dispersion by comparing two points

Comparing all the values

When using only two points in a data set to measure of dispersion then the odd high or low value gives a false impression. A better approach, therefore, would be to use every value in the data set then compare it against some benchmark. The general approach is to use the arithmetic mean as a benchmark and then calculate how far each variable is from the benchmark. In other words, we calculate how far a value deviates, or varies, from the mean. The sum of squared deviations of scores from their mean is lower than their squared deviations from any other number.

Expressed mathematically: [pic], where Xi are the variables and [pic]is the arithmetic mean. Therefore, if there are 1000 variables in a data set of one column, we must first calculate the arithmetic mean of the data and then find the difference from the mean of all 1000 variables. Fortunately, the use of a spreadsheet means we need only insert the formula once and then copy it down the column to make the other 999 calculations – the spreadsheet makes the appropriate adjustments to the formulae.

However, before we can exploit the spreadsheet’s capabilities we need to understand how formulae and functions are adjusted when they are moved or copied.

Types of reference

When a formula is copied from one location to another in the worksheet, Excel adjusts the cell references in the formula relative to their new positions. It is, therefore, important to select the correct type of reference. Excel recognizes three types of references:

1. relative;

2. absolute;

3. mixed.

Relative references

A relative reference can be compared with giving someone directions that explain how to reach a location from a specific starting point. For example, if we were to say ‘Mrs. Smith’s home is the third house down from this one on the other side of the road’ then it is only valid from one location. Therefore, if you move location the directions will need to be different.

If a formula containing a relative reference is copied, then that relationship is also copied. For example, if cell B9 contains the formula =B4+B5+B6+B7 and is then copied to cell C9, Excel would automatically change the formula to =C4+C5+C6+C7.

Absolute references

Frequently, you will not want the cell references to be adjusted when a formula is copied – thus the reference will need to be made absolute. Using an absolute reference is similar to providing a street address – ‘Mrs Smith lives at 19 Railway Cutting’. In this form of reference, the final destination will always be that particular house in Railway Cutting – no matter where the directions are given from.

In Excel, an absolute reference is designated by adding a dollar sign ($) before the column letter and row number. A reference such as $D$4 will always refer to cell D4 no matter where the formula is copied to. For example, if the formula in B10 is =B8/$D$4 and then copied to cell C10, the formula will become =C8/$D$4.

Mixed references

A mixed reference is designated by adding a dollar sign ($) before the column letter or row number. For example, in the mixed reference $A4 the column reference ($A) is absolute but the row reference (4) is relative.

Absolute vs relative:

A relative reference: is a cell reference in a formula that is adjusted when the formula is copied.

An absolute reference: is a cell reference in a formula that does not change when copied to a new location.

A mixed reference: is a reference that is only partially absolute, such as A$2 or $A2. When a formula that uses a mixed reference is copied to another cell, only part of the cell reference is adjusted.

Changing the reference type

1. Select the cell where the results of your calculation are to appear.

2. Enter the formula containing the reference.

3. Position the cursor within the relevant reference in the formula bar.

4. Press the F4 key once and a $ sign will appear with the cell reference ($A$l). Each time you press the F4 key the reference changes in the following order:

5. Relative (A1); Absolute ($A$1); Mixed (A$1); Mixed ($A1); Relative(A1).

Copying formulae using the mouse and standard toolbar

1. Select the cell containing the formula that you wish to copy.

2. Click on the copy icon [pic] on the standard toolbar – a shimmering outline will appear around the selected cells.

3. Select the cell(s) into which you wish to copy the formula.

4. Click on the paste icon [pic] on the standard toolbar.

Copying formulae by dragging

1. Select the cell containing the formula that you wish to copy (make it active).

2. Move to the cell handle – the cursor changes to a thin black cross. [pic]

3. Using the mouse, drag the cursor to the last cell in the range.

4. You will find that you will only be able to copy the formula horizontally or vertically. To move diagonally, copy in one direction first (either horizontally or vertically) and then drag in the other direction.

We will continue using the revenue data of firm A in the file basedata. Your spreadsheet should contain your earlier measures of dispersion.

To understand the logic involved in variation, we will calculate from first principles then examine the various functions offered by the spreadsheet.

So add 4 extra rows above row 2.

Inserting rows or columns

Inserting rows

1. Click on the row heading (in this case row 2).

2. Select Insert on the menu bar.

3. On the drop-down menu: select Rows.

4. A new row will be added ABOVE the selected row.

5. Should you need to add further rows, hold down the control key and press the Y key for each additional row.

Inserting columns

1. Click on the column heading.

2. Select Insert on the menu bar.

3. On the drop-down menu: select Columns.

4. A new column will be added to the LEFT of the selected row.

5. Should you need to add further columns, hold down the control key and press the Y key for each additional column.

Now we can move on to calculating the remaining measures of dispersion.

Mean deviation

The mean deviation is simply the arithmetic mean of all the deviations. It is expressed in mathematical symbols as:

[pic]

Which is translated as: sum (Σ) all the absolute deviations [pic]and divide by the number of deviations (n). The straight brackets (modulus) specify that the sign is ignored, i.e., an absolute value.

1. Calculate [pic] (arithmetic mean) of the raw data.

In cell G14: Use the AVERAGE function to calculate the mean of your raw data.

2. Calculate all the absolute deviations[pic].

Calculate the first absolute deviation from the mean:

In cell B7 enter: =ABS(A7-$G$14)

This is equivalent to: [pic]. The absolute function ensures the returned value is positive. [pic]

Copy the formula down to cell B111.

Since A7 is a relative reference, as it is copied down it becomes A8 then A9 up to A111. However, since G14 is an absolute reference, denoted by the $’s, it will always refer to G14 – the mean.

In cell B6 label your new data set: Abs Devs

3. Calculate the sum of all the absolute deviations.

In cell B4: Use the Sum function to find the sum of all the deviations (contained in cells B7:B111).

4. Find the mean of the deviations.

In cell G15: Use the COUNT function to find n.

In cell G17 enter: =B4/G15

Since B4 equals[pic] and G15 equals n, G17 [pic].

As a result of this statistic we can now say that on average the revenue of Firm A is £542.29 plus or minus £42.35.

The function =AVEDEV(A:A) returns the mean deviation in one cell and thus does away with the need for the above four steps.

Variance

[pic]

1. Calculate [pic] (arithmetic mean) of the raw data.

We already have this value in cell G14

2. Calculate the square of each deviation[pic].

Calculate the first square deviation from the mean:

In cell C7 enter: =(A7-$G$14)^2

The ^ means to the power of.

Here we have subtracted the mean of the raw data from the first variable (X1) and squared the result

Copy the formula down to cell C111.

Thus we have subtracted the mean of the raw data from the each successive variable (X2, X3, X4… X105), squaring each result.

In cell C6 label your new data set: Sq Devs

3. Calculate the sum of all the squared deviations.

In cell C4: Use the Sum function to find the sum of all the squared deviations (contained in cells C7:C111).

4. Find the mean of the deviations.

Use cell G15 in which you have already found n.

In cell G18 enter: =C4/G15

Since C4 equals [pic] and G15 equals n, then,

G18 [pic].

From our calculation we can say that the variance of Firm A’s revenue is £2687.872 – which has little meaning. As the result is given in square units, the variance is rarely used in business statistics.

The function =VARP(A:A) returns the variance.

Standard deviation

Although variance is a useful measure, results measured in squares of numbers make it tricky to relate the variance to the original data. This problem is easily remedied by simply finding the square root of the variance and calling the measure standard deviation.

The standard deviation is the most commonly used measure of dispersion. If a data set contains an odd extreme high and/or low value, the standard deviation imparts a more practical measurement than the range. It is also mathematically flexible e.g., many formulae in inferential statistics use the standard deviation.

The formula for calculating the standard deviation is:

[pic]

which is the square root of the variance. Consequently, we follow the same procedure for calculating the variance but, in addition, we can either use the square root function or raise the original equation to the power of 0.5 as follows:

Either,

In cell G19 enter: =SQRT(C4/G15)

or,

In cell G19 enter: =(C4/G15)^0.5

Both methods produce the same result of 51.84464224, as would the function =STDEVP(A:A).

Co-efficient of variation

Where two data sets with markedly different means are compared, the standard deviation is used in conjunction with the mean to calculate the coefficient of variation (see page 46 of the textbook).

The standard deviation is not a good measure of spread in highly-skewed distributions and should be supplemented in those cases by the semi-interquartile range.

Now calculate all the measures of dispersion for firm B on Sheet 2.

You should now be capable of using Excel to illustrate your data and to calculate the measures of central tendency and of dispersion.

Exercise 2.4

Practice your techniques by examining the data on Sheet 3 of the file Basedata, which shows the output in kilograms of two production machines measured over 84 shifts. Use your illustrative and statistical techniques to decide if one machine performs better than the other.

You need to calculate all the statistical measures you have been taught to date and then decide on the most appropriate measure of central tendency and dispersion, depending on the way the data is distributed.

Regression and correlation

By the end of this chapter you should be able to:

❑ calculate the regression equation:

❑ calculate the co-efficient of determination (R2).

3.1 Regression analysis

Spreadsheets or statistical packages, with their abundance of functions, will provide all the necessary statistics simply and quickly. However, an understanding of the basic principles will provide a foundation for developing regression further. Consequently, we will start by using the first principles of the coding formula (see textbook, p. 55) to calculate the necessary statistics.

The data below is based upon Luigi’s ice cream business and is used to find the regression equation or line of best fit using a spreadsheet. In this example, it is assumed that the daily temperature influences the number of ice creams sold, as follows:

|Temp (ºC) |Sales |

|15 |272 |

|16 |282 |

|20 |315 |

|18 |289 |

|21 |320 |

|19 |301 |

table 3.1: Papa Luigi’s Ices

figure 3.1 Scatter diagram of Papa Luigi’s Ices

A scatter diagram using the data in Table 3.1 gives the graph shown in Figure 3.1.

It is obvious from the diagram that the six points do not lie on one straight line. However, it is reasonable to try to obtain the straight line that comes as close as possible to as many of the points as possible. This could be attempted by drawing a straight line onto the graph that appears to come as close to the points as possible. Such an approach would plainly be quite subjective, with different people drawing (slightly) different straight lines.

This can be achieved mathematically by calculating the parameters of the straight line (m and c) that comes closest to these points. A technique to minimise the sum of the squared deviations of the observations from the line is used. For this reason, it is called the method of least squares – using an approach similar to that used in the calculations of grouped data.

Exercise 3.1 Calculating the regression equation from first principles

Remember that we require an equation such that:

Y = mX + c

We use the coding formula to obtain values for m and c of the linear equation linking Sales to Temperature. First, calculate the slope, m.

[pic]

This value is then used to calculate c.

1. Assign 7 columns of your spreadsheet.

2. Label the columns: X Y x y xy x^2 y^2

3. Enter the Temperature (X) and the Sales (Y) data into the appropriate columns.

4. Calculate[pic]in 2 empty cells above the X and Y values.

5. Complete the rest of the first row of the table. Appropriate use of the $ symbol will allow the formulae to be copied down the length of the table.

a) In the cell below the label x: Enter a formula to find the value of x

where: [pic].

(Consider using a mixed reference.)

b) In the cell below the label y: Enter a formula to find the value of y

where: [pic].

c) In the column labelled xy enter a formula which multiplies the calculated value of x by the calculated value of y.

d) In the column labelled x^2, square the values of x.

e) Copy the above formula one cell to the right (into the column labelled y^2).

6. Complete the rest of the table.

a) Highlight the cells containing the above 5 calculations.

b) Copy to the bottom of the table.

7. Sum the last 3 columns.

Your spreadsheet should now resemble Table 3.2.

[pic]

table 3.2

The cell references that follow assume a layout as in Table 3.2. If your layout differs, adjust the following cell references as necessary.

8. Format columns E:I to 4 decimal places.

9. Calculate the m value using the coding formula:

[pic]

a) Label cell G1: m (Slope)

b) In cell G2: Enter the formula to calculate the slope.

It should return the value 7.9938

Use this value in conjunction with [pic] to calculate c: where [pic].

[pic] have already been calculated.

10. Calculate c.

a) Label cell H1: c (Intercept)

b) In cell H2: Enter the intercept formula.

It should return the value 151.2796

11. Calculate R2 where:

[pic]

Again the values for: [pic], [pic]and [pic]have all been calculated (cells G11:I11 in Table 2).

a) Label cell I1: R^2

(b) In cell H2: Enter the intercept formula.

It should return the value 0.9625.

Interpretation

The value of m (in this case 7.9938) is also known as the X coefficient. So a negative coefficient would show an inverse relationship: as one variable increases, the other decreases.

However, the value of m is positive, demonstrating a positive relationship between the two variables: as one increases so does the other. This is what we expected from these two variables.

As R2 is 0.9625, which is greater than 0.9 and close to one (perfect correlation) there is an extremely strong relationship between the two variables. It seems reasonable to conclude that there is a strong positive relationship between sales of ice cream and the daily temperature. We can also say that 96.25% of the sales of ice cream can be explained by the daily temperature, whilst 3.75% is the result of other factors.

However, caution needs to be exercised when interpreting R2.

The coefficient measures the strength of the relationship between two variables – it does not measure cause and effect.

The coefficient measures the strength of the linear relationship between the two variables. A low value for R2, therefore, implies little evidence of a linear relationship but the relationship could be curvilinear.

Exercise 3.2

Use a spreadsheet to solve self-check question 3.3 on page 58 of the key text, then check your calculations with the answers on pages 330–331.

Probability distributions

4.1 The Normal distribution and Z statistic

By the end of this chapter you should be able to:

❑ examine a data set, graphically, for a normal distribution;

❑ standardize a data set which is normally distributed;

❑ use Excel functions to:

convert Z-values to a probability;

convert a probability to a Z-value.

Calculating the area under the curve of the normal distribution can be achieved by finding the Z-value (or Z-statistic).

The Z-statistic is simply the number of standard deviations of a variable from its mean value. For values less than the mean, the sign of the Z-statistic is negative; for values above the mean, the sign of the Z-statistic is positive.

The process of converting variables to a Z-statistic is known as transformation or standardizing the normal distribution.

Standardizing a distribution uses the equation:

[pic]

where: Z = the number of standard deviations from the mean

Xi = the value to be standardized

μ = the mean

σ = the standard deviation

Exercise 4.1

It is known from past experience that the life of a machine component is approximately normally distributed with a mean (μ) of 200 hours and a standard deviation (σ) of 4 hours. Calculate the probability that an individual component, chosen at random has a life of:

a) at least 206 hours;

b) less than 198 hours;

c) between 204 and 208 hours.

[pic]

Solving the problem with Excel is straightforward where all we need to do is to substitute the given values into the equation.

1. Calculate the Z-statistic for 206 hours.

a) In cell B3 enter the label: X-value

b) In cell B4 enter the X value: 206

c) In cell C3 enter the label: Z-value

d) In cell C4 enter the following equation as an Excel formula:

[pic]

where: Z = the number of standard deviations from the mean

Xi = the value to be standardized 206

μ = the mean 200

σ = the standard deviation 4

If you obtain the value 181 you have obviously made a mistake!

You have entered the formula: =206-200/4 which is incorrect.

Excel formulae follows mathematical precedent so you need to insert brackets around the numerator, i.e., =(206-200)/4, which will return the correct Z-value of 1.5.

We can now say that in this distribution with a mean of 200 and standard deviation of 4, 206 has a Z-value of 1.5.

We could also say that 206 is situated 1.5 standard deviations away from the mean and, since the value is positive, it is to the right of the mean.

The STANDARDIZE(x,mean,standard_dev) function also calculates the Z-statistic where:

X is the value you wish to standardize.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

For a full description of the function, use the help facility.

e) Click on cell C4, enter: =STANDARDIZE(206,200,4)

Again you should obtain a Z-value of 1.5.

Since cell B4 contains the our X-value of 206 we could simply have entered =STANDARDIZE(B4,200,4).

2. Convert the Z-statistic to a P value.

a) Using first principles

i) Use your tables to look up the probability of obtaining a value of 1.5. The answer should be (0.9332).

As the tables give the area to the left, there is a 93.32% probability that values will be 206 or less.

ii) However, we want 206 or greater, and any value that is greater must be to the right of the distribution:

Area = 1 – 0.9332 = 0.0668

b) Using an Excel function

The NORMSDIST(z) function(where z = the Z-value) is equivalent to looking up the Z-statistic in tables and also gives the area to the left of the distribution.

i) In cell D3 enter the label: P-value

ii) In cell D4 enter: =1-NORMSDIST(1.5)

Since cell C4 contains our Z-statistic of 1.5 we could simply have entered =NORMSDIST(C4)

The answer should be 0.066807 which gives a prob-

ability that 6.6807% of all values in the distribution will be greater than 206.

By subtracting NORMSDIST(z) from 1, we will obtain the probability to the right of the X value (i.e. values greater than X).

Out of every 100 components, 6.68 (almost 7) have a life of at least 206 hours.

3. Calculate the probability for 198 hours.

[pic]

a) In cell B5 enter the X value: 198

b) In cell C5 enter: Standardize the X value (see 1(e) above)

You should obtain a z-value of -0.5.

c) In cell D5 enter: =NORMSDIST(C5)

You should obtain a probability of 0.308538.

Out of every 100 components, 30.85, or almost one-third, have a life of less than 198 hours.

Calculate the probability that an individual component, chosen at random, has a life of between 204 and 208 hours

4. Calculate the probability for 204 to 208 hours.

[pic]

a) In cell B7 enter the higher X value: 208

b) In cell C7 enter: Standardise the X value

You should obtain a z-value of 2.

c) In cell D7 enter: =NORMSDIST(C7)

You should obtain a probability of 0.97725.

d) In cell B8 enter the lower X value: 204

e) In cell C8 enter: Standardise the X value

You should obtain a z-value of 1.

f) In cell D8 enter: =NORMSDIST(C8)

You should obtain a probability of 0.841345.

g) In cell D9 subtract the lower p-value (0.841345 in cell D7) from the higher (0.97725 in cell D8) – a result of 0.135905.

Therefore, 13.59 out of every 100 components have a life of between 204 and 208 hours.

Summary

Whenever we use Excel functions in place of tables the resulting value gives the area to the left of the distribution. Therefore, if we need to find the value of:

X or less, use NORMSDIST(Z)

X or more, use 1-NORMSDIST(Z)

Whenever the probability between 2 values is required:

1. Find the probability of the higher value.

2. Find the probability of the lower value.

3. Subtract the lower probability from the higher.

4.2 Converting a probability into a Z-statistic

When using standard normal tables we find the Z-statistic from a probability, or ρ -value, by:

a) locating the relevant ρ -value in the body of the tables;

b) in the appropriate row read off the z-score (shown in the row heading)

c) in the appropriate column read off the decimal value shown in the column heading and add to part (b) above.

Excel uses the NORMSINV function to convert probability to a Z-value as follows:

• Syntax: NORMSINV(probability)

Probability is a probability corresponding to the normal distribution.

• Remarks

❑ If probability is non-numeric, NORMSINV returns the #VALUE! error value.

❑ If probability < 0 or if probability > 1, NORMSINV returns the #NUM! error value.

❑ The result given is for the LEFT of the normal distribution.

Exercise 4.2

1 What is the Z-value where P < 0.908789 ?

As the probability is less than ( 0.235641?

As the probability is greater than (>), the result is the probability to the RIGHT of the normal distribution, so we need to subtract the probability from 1:

In a blank cell enter: =NORMSINV(1-0.235641)

The result equals: 0.720395 (i.e. Z = 0.720395)

Exercise 4.3

Using Excel functions, find the value of Z1 for each of the following:

a) P(Z > Z 1) = 0.3300;

b) P(Z < Z 1) = 0.9732;

c) P(Z < Z 1) = 0.8212;

d) P(Z < Z 1) = 0.5478;

For answers see page 419 of your textbook.

Exercise 4.4

The life of a certain battery is known to be normally distributed with a mean of 30 hours and a standard deviation of 30 minutes. If 1.7% of the batteries last longer than K hours, find the value of K.

Solution

1. In a vertical column of 5 cells enter the labels μ, σ, ρ, Z and K

(I have used cells D4:D8).

2. In the appropriate cells of column E enter the values for:

a) μ and σ ;

b) ρ as decimal (i.e. 0.017).

3. Find the Z-statistic of the probability (ρ).

As we are interested in 1.7% or longer, the probability given is the area to the right of the distribution. Excel’s NORMSINV function gives the area to the left, thus we must subtract the ρ -value from 1. If you have used the same cells as I have then:

(a) In cell E7 enter: =NORMSINV(1-E6)

4. Use the equation: So far we have known the specific value for the variable X and then calculated the probabilities with which these values might occur. If we know the probability and need to find the value of X associated with that probability then we need to rearrange the equation to solve for K as follows:

[pic]

Therefore in cell E8 enter the above equation in Excel format to calculate the value of K. Your results should resemble:

[pic]

5. Interpret your findings.

With a mean of 30 hours and a standard deviation of 30 minutes, 1.7% of batteries last longer than 31.06 hours.

-----------------------

204 208

198 200

200 206

[pic]

[pic]

[pic]

Jointly they give the cell reference, for example, A1, D5.

[pic]

Where:

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download