College of Engineering and Computer Science | California ...



[pic]

|College of Engineering and Computer Science

Mechanical Engineering Department

Mechanical Engineering 309

Numerical Analysis of Engineering Systems | |

| |Spring 2002 Ticket: 57501 Instructor: Larry Caretto |

Lecture Notes on Linear Regression

Objective

These notes assignment introduce the concept of regression analysis, which is used to estimate a relationship between two variables or more variables. The goal is to have an understanding of the use of the Excel functions for linear regression.

Background

If several measurements are made on pairs of experimental data {(xi,yi), i = 1,...,N} it is possible to determine an approximate equation of a straight line that gives a best fit to the data. The equation of this best-fit line is written as follows.

= a + b x

In this equation, we use the symbol instead of y to indicate that the predicted value found from the equation, = a + b x is an approximate result. For a given data point, (xi,yi), the value of yi represents the actual data and we would obtain the predicted value of y, at the point x = xi from the equation i = a + b xi. The difference between the measured and predicted value is yi - i.

|[pic] |For example x may represent data on high school grade point averages of |

| |students entering CSUN and y may represent their first semester grade point |

| |average here. We would like to use these data to develop a relationship |

| |that would allow us to predict the CSUN grade point average (y) from the |

| |high school grade point average. The chart at the left shows that the |

| |actual data are scattered, but we are trying to find equation for a line |

| |that best fits the data. |

In finding the best-fit line we not only want to determine the quantities a and b that give us the equation for the line, we would also like to have some measure of how well the line fits the data. The equations to determine these quantities are given in this section.

The equations used to calculate a and b can be found by an analysis which minimizes the distances between the actual data points, yi, and the fitted points, i = a + b xi. That is one seeks the values of a and b that minimize the following sum of differences squared.

[pic]

Taking partial derivatives of this sum with respect to a and to b and setting the results equal to zero tives the following results for a and b in terms of the entire set of data, {xi,yi}, using the definitions of mean values:

= and =

The regression equations for the slope, b, and the intercept, a, are given below:

b = [pic] and a = - b

A statistical estimate of the variability can be found from the difference between the actual data points yi and the estimated value i = a + b xi. This measure, which is called the standard error and has the symbol sy|x, is defined as follows:

sy|x = [pic]

Another measure, called the R2 value is considered to be a measure of the amount of variation in the data which is explained by the regression equation. An R2 value of zero means that the regression cannot explain any of the variation in y; an R2 value of one means that all the variation in y can be explained by the regression equation. The value of R2 is computed from the following equation:

[pic]

The analysis above with one independent variable x, and one dependent variable, y, can be expanded to consider the case where several independent data items, xj, are used to predict one dependent data item. For example we could use existing data on students’ high school grade point averages, SAT scores, and CSUN GPA’s to develop an equation that could be used to predict students’ CSUN grade point averages as a function of the other three variables. In this example, we would want to find the coefficients b0, b1, b2, and b3 in the equation:

CSUN GPA = b0 + b1 (High School GPA) + b2 (SAT Verbal) + b3 (SAT Math)

The above example uses three predictive variables; these could be labeled x = High School GPA, y = SAT Verbal and z = SAT Math. We are trying to predict a fourth variable, CSUN GPA. In general we can have K predictive variables, where K will change from problem to problem. We need a notation that will readily accommodate the ability to code a different number of predictive variables. To do this we label the predictive variables as x1 to xK. In the previous example, x1 = High School GPA, x2 = SAT Verbal and x3 = SAT Math. The variable that we are using to predict is labeled y. In the example above y is the CSUN GPA. In this general case where there are K different variables used to predict y the regression equation can be written as follows

= b0 +

We continue to use the symbol instead of y to indicate that the predicted value found from the above equation is an approximate result.

We will have several sets of input data. In general, each input data set will consist of one value of y and one value for each of the xj. In the example of predicting the CSUN GPA, each data set would consist of the CSUN GPA, the High School GPA, the SAT Math Score and the SAT Verbal score for one student. In general we have, for the mth data set one value of ym and K values of the x variables (x1m, x2m,..., xKm). The notation xim means the value of the variable xi for the mth data set. In our CSUN GPA example, where x2 is the notation for the SAT verbal score, the symbol x24 represents the SAT Verbal score of the 4th student. If we have N sets of data we will have N(K+1) different numerical values in the input data set. These are used to determine the values of the K+1 coefficients b0 to bK.

The coefficients b0 to bK are determined by minimizing the sum of residuals squared, just as was done in the case of one independent variable, x, and one dependent variable, y. Only now we have to take partial derivatives with respect to all the b coefficients from b0 to bK. When we do this we will get set of K+1 simultaneous linear equations. The unknowns in those equations are the b0 to bK coefficients. The linear equation coefficients are denoted as Aij and the right hand sides of the linear equations are given the symbol ci. With this notation the system of linear equations is written as follows:

= ci i = 0,...,K

The coefficients, Aij, and the right-hand sides, ci, are found from various sums of the input data as shown below. The computations are simplified if a fictitious variable, x0, is defined such the value of x0, for each data set, m, is one.

x0m = 1 m = 1, …, N

With this definition the equations for computing Aij and ci can be written as follows:

Aij = [pic] and ci = [pic]

Note that the coefficients Aij are symmetrical; that is Aij = Aji. We can use this symmetry relation to reduce the number of coefficients that need to be computed.

|The table of data shown at the left shows eight sets of data for the example we |Sample Data Set |

|have been using of relating the CSUN gpa (y) to the high school GPA (x1), the SAT |Multivariate Regression |

|Verbal score (x2) and the SAT math score (x3). For this collection of data then, K|K = 3 variables; N = 8 data sets |

|= 3, N = 8, and we need to evaluate 4 different values of bj. As a sample | |

|calculation, the coefficient A23 = = x21 x31 + x22 x32 +x23 x33 +x24 x34 +x25 x35 |m y x1 x2 x3 |

|+x26 x36 +x27 x37 +x28 x38 = (440)(500) + (350) (400) + (440)(540) + (350)(370) + |1 2.55 3.00 440 500 |

|(450)(480) + (200)(320) + (310)(470) + (290)(400). Note that this is the same as |2 1.95 3.47 350 400 |

|A32. |3 1.89 3.14 440 540 |

| |4 2.24 3.46 350 370 |

| |5 2.31 3.59 450 480 |

| |6 1.74 1.75 200 320 |

| |7 1.87 3.03 310 470 |

| |8 0.83 3.18 290 400 |

Of course, this summation would be done within a for loop in a visual basic or C++ program. Once the Aij coefficients (and the ci right-hand sides) are computed it is necessary to solve the set of simultaneous linear equations to obtain the bj coefficients.

Once the coefficients are known, it is possible to compute a standard error, as was done for the case of one independent variable. However, the computation of the estimate, is more complex, because it depends on all the computed bj coefficients and all the xj values for the data point. Also, in this case the denominator of N – 2 is replaced by N – K – 1, where K+1 is the number of predicted b coefficients. The equation for the standard error, in this case, is shown below.

sy|x = [pic]

The equation for R2 can be written as follows for the multivariable case.

[pic]

Use of the Excel LINEST Function

LINEST is an Excel array function that is used for linear regression. It can handle one or more independent variables. The results of the LINEST function are written into an area that always contains five rows. The number of columns is equal to the number of variables, including y. Thus for use with one x and one y, the area would be two columns wide. For the example used above with three independent x variables and one y variable, the area for LINEST would be four columns wide.

The function is invoked by the following command:

LINEST(known_y's,known_x's,const,stats)

Known_y's   is the set of y-values you already know in the relationship y = mx + b.

• If the array known_y's is in a single column, then each column of known_x's is interpreted as a separate variable.

• If the array known_y's is in a single row, then each row of known_x's is interpreted as a separate variable.

Known_x's   is an optional set of x-values that you may already know in the relationship y = mx + b.

• The array known_x's can include one or more sets of variables. If only one variable is used, known_y's and known_x's can be ranges of any shape, as long as they have equal dimensions. If more than one variable is used, known_y's must be a vector (that is, a range with a height of one row or a width of one column).

• If known_x's is omitted, it is assumed to be the array {1,2,3,...} that is the same size as known_y's.

Const   is a logical value specifying whether to force the constant b to equal 0.

• If const is TRUE or omitted, b is calculated normally.

• If const is FALSE, b is set equal to 0 and the m-values are adjusted to fit y = mx.

Stats   is a logical value specifying whether to return additional regression statistics.

If stats is TRUE, LINEST returns the additional regression statistics.

Remember to end an array function entry by pressing Control-Shift-Enter!

The results of the LINEST function are presented in the following format.

| |A |B |C | | | |

|1 |bK |bK-1 |bK-2 | |b1 |b0 |

|2 |Std Err in bK |Std Err in bK-1 |Std Err in bK-2 | |Std Err in b1 |Std Err in b0 |

|3 |R2 |sy|x | | | | |

|4 |F |df | | | | |

|5 |regress SS |resid SS | | | | |

In this display the order of the bj coefficients is the opposite of the order in which the x data are entered. For example if the values of x1 to x3 are located in columns A to C, the value of b1 will be in column C; the values of b2 and b3 will be in columns B and A, respectively. The value of b0 is always in the last column.

The standard errors in the coefficients can be used to compute a confidence interval for the coefficients. This involves the use of the students t-distribution and a basic knowledge of statistical confidence intervals, confidence levels, and hypothesis testing. A simple rule of thumb is that the coefficient should be at least twice its standard error. If not, the coefficient is not significantly different from zero. The actual calculations, involving the Excel spreadsheet functions TDIST and TINV for the t-distribution are shown below.

Excel also has a regression tool as part of the data analysis toolpack. This may have to be included as an add in if it is not already on your tools menu.

Example Calculation

Use the LINEST function of Excel to check the following set of data.

|Test Data and Results for Linear Regression |

|xi |510 |533 |603 |670 |750 |

|yi |1.3 |0.1 |1.5 |1.8 |3.9 |

|Results: a = -5.77566; b = 0.012224; R2 = 0.768457 |

|slope |=LINEST(B2:B6,A2:A6,,TRUE) |=LINEST(B2:B6,A2:A6,,TRUE) |intercept |

|slope error |=LINEST(B2:B6,A2:A6,,TRUE) |=LINEST(B2:B6,A2:A6,,TRUE) |intercept error |

|R2 |=LINEST(B2:B6,A2:A6,,TRUE) |=LINEST(B2:B6,A2:A6,,TRUE) |sy|x |

|F |=LINEST(B2:B6,A2:A6,,TRUE) |=LINEST(B2:B6,A2:A6,,TRUE) |degrees of freedom |

|Regression sum of squares |=LINEST(B2:B6,A2:A6,,TRUE) |=LINEST(B2:B6,A2:A6,,TRUE) |Residual sum of squares |

|Ratio of coefficient to standard error|=ABS(G13/G14) |=ABS(H13/H14) |Ratio of coefficient to standard |

| | | |error |

|Probability that coefficient is zero |=TDIST(G18,H16,2) |=TDIST(H18,H16,2) |Probability that coefficient is |

| | | |zero |

|Confidence Level Desired |0.95 |=TINV(1-G20,H16) |t-statistic for confidence level |

|slope upper confidence level |=G13+H20*G14 |=H13+H14*H20 |intercept upper confidence level |

|slope lower confidence level |=G13-G14*H20 |=H13-H14*H20 |intercept lower confidence level |

Fitting Nonlinear Functions

It is possible to transform some nonlinear relationships into linear relationships so that the linear regression format may be used. For example, we can fit data to the polynomial equation y = a + bx + cx2 + dx3, by defining the following variables: X1 = x, X2 = x2, X3 = x3. Then we are trying to fit the following linear regression equation: y = a + bX1 + cX2 + dX3. Other kinds of transformations are possible. For example, chemical kinetic rate processes are sometimes modeled by the following equation k = ATDe-E/T. By taking the natural logarithm of this equation we have the following equation: ln(k) = ln(A) + Dln(T) +E(-1/T). Thus, we can define the following set of variables: Y = ln(k), X1 = ln(T), and X2= -1/T. When we do this we can use the linear regression to fit an equation of the form Y = a + bX1 + cX2, where a = ln(A), b = D, and c = E.

Some functional forms cannot have a simple linear transformation to a form that can be used with the linear regression algorithms. For example the equation y = a(1 – e-bt) cannot be transformed into a form that can be solved for a and b using linear regression equations. Nonlinear regression algorithms, which are much more complex than linear regression, can be used for this purpose.

References

Linear regression is discussed in Chapter 17 of Chapra and Canale. The introduction to part five presents some simple material on the statistical background for regression.

Excel help text for LINEST

LINEST

Calculates the statistics for a line by using the "least squares" method to calculate a straight line that best fits your data, and returns an array that describes the line. Because this function returns an array of values, it must be entered as an array formula.

The equation for the line is:

y = mx + b or

y = m1x1 + m2x2 + ... + b (if there are multiple ranges of x-values)

where the dependent y-value is a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that LINEST returns is {mn,mn-1,...,m1,b}. LINEST can also return additional regression statistics.

Syntax

LINEST(known_y's,known_x's,const,stats)

Known_y's   is the set of y-values you already know in the relationship y = mx + b.

• If the array known_y's is in a single column, then each column of known_x's is interpreted as a separate variable.

• If the array known_y's is in a single row, then each row of known_x's is interpreted as a separate variable.

Known_x's   is an optional set of x-values that you may already know in the relationship y = mx + b.

• The array known_x's can include one or more sets of variables. If only one variable is used, known_y's and known_x's can be ranges of any shape, as long as they have equal dimensions. If more than one variable is used, known_y's must be a vector (that is, a range with a height of one row or a width of one column).

• If known_x's is omitted, it is assumed to be the array {1,2,3,...} that is the same size as known_y's.

Const   is a logical value specifying whether to force the constant b to equal 0.

• If const is TRUE or omitted, b is calculated normally.

• If const is FALSE, b is set equal to 0 and the m-values are adjusted to fit y = mx.

Stats   is a logical value specifying whether to return additional regression statistics.

• If stats is TRUE, LINEST returns the additional regression statistics, so the returned array is {mn,mn-1,...,m1,b;sen,sen-1,...,se1,seb;r2,sey;F,df;ssreg,ssresid}.

• If stats is FALSE or omitted, LINEST returns only the m-coefficients and the constant b.

The additional regression statistics are as follows.

|Statistic |Description |

|se1,se2,...,sen |The standard error values for the coefficients m1,m2,...,mn. |

|Seb |The standard error value for the constant b (seb = #N/A when const is FALSE). |

|r2 |The coefficient of determination. Compares estimated and actual y-values, and ranges in value from 0 to 1. If it|

| |is 1, there is a perfect correlation in the sample — there is no difference between the estimated y-value and |

| |the actual y-value. At the other extreme, if the coefficient of determination is 0, the regression equation is |

| |not helpful in predicting a y-value. For information about how r2 is calculated, see "Remarks" later in this |

| |topic. |

|sey |The standard error for the y estimate. |

|F |The F statistic, or the F-observed value. Use the F statistic to determine whether the observed relationship |

| |between the dependent and independent variables occurs by chance. |

|df |The degrees of freedom. Use the degrees of freedom to help you find F-critical values in a statistical table. |

| |Compare the values you find in the table to the F statistic returned by LINEST to determine a confidence level |

| |for the model. |

|ssreg |The regression sum of squares. |

|ssresid |The residual sum of squares. |

The following illustration shows the order in which the additional regression statistics are returned.

[pic]

Remarks

• You can describe any straight line with the slope and the y-intercept:

Slope (m):

To find the slope of a line, often written as m, take two points on the line, (x1,y1) and (x2,y2); the slope is equal to (y2 - y1)/(x2 - x1).

Y-intercept (b):

The y-intercept of a line, often written as b, is the value of y at the point where the line crosses the y-axis.

The equation of a straight line is y = mx + b. Once you know the values of m and b, you can calculate any point on the line by plugging the y- or x-value into that equation. You can also use the TREND function.

• When you have only one independent x-variable, you can obtain the slope and y-intercept values directly by using the following formulas:

Slope:

=INDEX(LINEST(known_y's,known_x's),1)

Y-intercept:

=INDEX(LINEST(known_y's,known_x's),2)

• The accuracy of the line calculated by LINEST depends on the degree of scatter in your data. The more linear the data, the more accurate the LINEST model. LINEST uses the method of least squares for determining the best fit for the data. When you have only one independent x-variable, the calculations for m and b are based on the following formulas:

[pic]

[pic]

• The line- and curve-fitting functions LINEST and LOGEST can calculate the best straight line or exponential curve that fits your data. However, you have to decide which of the two results best fits your data. You can calculate TREND(known_y's,known_x's) for a straight line, or GROWTH(known_y's, known_x's) for an exponential curve. These functions, without the new_x's argument, return an array of y-values predicted along that line or curve at your actual data points. You can then compare the predicted values with the actual values. You may want to chart them both for a visual comparison.

• In regression analysis, Microsoft Excel calculates for each point the squared difference between the y-value estimated for that point and its actual y-value. The sum of these squared differences is called the residual sum of squares. Microsoft Excel then calculates the sum of the squared differences between the actual y-values and the average of the y-values, which is called the total sum of squares (regression sum of squares + residual sum of squares). The smaller the residual sum of squares is, compared with the total sum of squares, the larger the value of the coefficient of determination, r2, which is an indicator of how well the equation resulting from the regression analysis explains the relationship among the variables.

• Formulas that return arrays must be entered as array formulas.

• When entering an array constant such as known_x's as an argument, use commas to separate values in the same row and semicolons to separate rows. Separator characters may be different depending on your locale setting in Regional Settings or Regional Options in Control Panel.

• Note that the y-values predicted by the regression equation may not be valid if they are outside the range of the y-values you used to determine the equation.

Example 1   Slope and Y-Intercept

The example may be easier to understand if you copy it to a blank worksheet.

1. Create a blank workbook or worksheet.

2. Select the example in the Help topic. Do not select the row or column headers.

[pic]

Selecting an example from Help

3. Press CTRL+C.

4. In the worksheet, select cell A1, and press CTRL+V.

5. To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Tools menu, point to Formula Auditing, and then click Formula Auditing Mode.

|  |A |

| |B |

|1 | |

| |Known y |

|2 |Known x |

| | |

|3 |1 |

| |0 |

|4 | |

| |9 |

|5 |4 |

| | |

| |5 |

| |2 |

| | |

| |7 |

| |3 |

| | |

| |Formula |

| |Formula |

| | |

| |=LINEST(A2:A5,B2:B5,,FALSE) |

| |  |

| | |

Note   The formula in the example must be entered as an array formula. After copying the example to a blank worksheet, select the range A7:B7 starting with the formula cell. Press F2, and then press CTRL+SHIFT+ENTER. If the formula is not entered as an array formula, the single result is 2.

When entered as an array, the slope (2) and the y-intercept (1) are returned.

Example 2   Simple Linear Regression

The example may be easier to understand if you copy it to a blank worksheet.

1. Create a blank workbook or worksheet.

2. Select the example in the Help topic. Do not select the row or column headers.

[pic]

Selecting an example from Help

3. Press CTRL+C.

4. In the worksheet, select cell A1, and press CTRL+V.

5. To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Tools menu, point to Formula Auditing, and then click Formula Auditing Mode.

|  |A |

| |B |

|1 | |

| |Month |

|2 |Sales |

| | |

|3 |1 |

| |3100 |

|4 | |

| |2 |

|5 |4500 |

| | |

|6 |3 |

| |4400 |

|7 | |

| |4 |

| |5400 |

| | |

| |5 |

| |7500 |

| | |

| |6 |

| |8100 |

| | |

| |Formula |

| |Description (Result) |

| | |

| |=SUM(LINEST(B2:B7, A2:A7)*{9,1}) |

| |Estimate sales for the ninth month (11000) |

| | |

In general, SUM({m,b}*{x,1}) equals mx + b, the estimated y-value for a given x-value. You can also use the TREND function.

Example 3   Multiple Linear Regression

Suppose a commercial developer is considering purchasing a group of small office buildings in an established business district.

The developer can use multiple linear regression analysis to estimate the value of an office building in a given area based on the following variables.

|Variable |Refers to the |

|y |Assessed value of the office building |

|x1 |Floor space in square feet |

|x2 |Number of offices |

|x3 |Number of entrances |

|x4 |Age of the office building in years |

This example assumes that a straight-line relationship exists between each independent variable (x1, x2, x3, and x4) and the dependent variable (y), the value of office buildings in the area.

The developer randomly chooses a sample of 11 office buildings from a possible 1,500 office buildings and obtains the following data. "Half an entrance" means an entrance for deliveries only.

The example may be easier to understand if you copy it to a blank worksheet.

1. Create a blank workbook or worksheet.

2. Select the example in the Help topic. Do not select the row or column headers.

[pic]

Selecting an example from Help

3. Press CTRL+C.

4. In the worksheet, select cell A1, and press CTRL+V.

5. To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Tools menu, point to Formula Auditing, and then click Formula Auditing Mode.

|  |A |

| |B |

|1 |C |

| |D |

|2 |E |

| | |

|3 |Floor space (x1) |

| |Offices (x2) |

|4 |Entrances (x3) |

| |Age (x4) |

|5 |Assessed value (y) |

| | |

|6 |2310 |

| |2 |

|7 |2 |

| |20 |

|8 |142,000 |

| | |

|9 |2333 |

| |2 |

|10 |2 |

| |12 |

|11 |144,000 |

| | |

|12 |2356 |

| |3 |

| |1.5 |

| |33 |

| |151,000 |

| | |

| |2379 |

| |3 |

| |2 |

| |43 |

| |150,000 |

| | |

| |2402 |

| |2 |

| |3 |

| |53 |

| |139,000 |

| | |

| |2425 |

| |4 |

| |2 |

| |23 |

| |169,000 |

| | |

| |2448 |

| |2 |

| |1.5 |

| |99 |

| |126,000 |

| | |

| |2471 |

| |2 |

| |2 |

| |34 |

| |142,900 |

| | |

| |2494 |

| |3 |

| |3 |

| |23 |

| |163,000 |

| | |

| |2517 |

| |4 |

| |4 |

| |55 |

| |169,000 |

| | |

| |2540 |

| |2 |

| |3 |

| |22 |

| |149,000 |

| | |

| |Formula |

| | |

| | |

| | |

| | |

| | |

| |=LINEST(E2:E12,A2:D12,TRUE,TRUE) |

| | |

| | |

| | |

| | |

| | |

Note   The formula in the example must be entered as an array formula. After copying the example to a blank worksheet, select the range A14:E18 starting with the formula cell. Press F2, and then press CTRL+SHIFT+ENTER. If the formula is not entered as an array formula, the single result is -234.2371645.

When entered as an array, the following regression statistics are returned. Use this key to identify the statistic you want.

[pic]

The multiple regression equation, y = m1*x1 + m2*x2 + m3*x3 + m4*x4 + b, can now be obtained using the values from row 14:

y = 27.64*x1 + 12,530*x2 + 2,553*x3+ 234.24*x4 + 52,318

The developer can now estimate the assessed value of an office building in the same area that has 2,500 square feet, three offices, and two entrances and is 25 years old, by using the following equation:

y = 27.64*2500 + 12530*3 + 2553*2 - 234.24*25 + 52318 = $158,261

Or you can copy the following table to cell A21 of the example workbook.

|Floor space (x1) |Offices (x2) |Entrances (x3) |Age (x4) |Assessed value (y) |

|2500 |3 |2 |25 |=D14*A22 + C14*B22 + B14*C22 + A14*D22 + E14 |

You can also use the TREND function to calculate this value.

Example 4   Using The F And R2 Statistics

In the previous example, the coefficient of determination, or r2, is 0.99675 (see cell A17 in the output for LINEST), which would indicate a strong relationship between the independent variables and the sale price. You can use the F statistic to determine whether these results, with such a high r2 value, occurred by chance.

Assume for the moment that in fact there is no relationship among the variables, but that you have drawn a rare sample of 11 office buildings that causes the statistical analysis to demonstrate a strong relationship. The term "Alpha" is used for the probability of erroneously concluding that there is a relationship.

There is a relationship among the variables if the F-observed statistic is greater than the F-critical value. The F-critical value can be obtained by referring to a table of F-critical values in many statistics textbooks. To read the table, assume a single-tailed test, use an Alpha value of 0.05, and for the degrees of freedom (abbreviated in most tables as v1 and v2), use v1 = k = 4 and v2 = n - (k + 1) = 11 - (4 + 1) = 6, where k is the number of variables in the regression analysis and n is the number of data points. The F-critical value is 4.53.

The F-observed value is 459.753674 (cell A18), which is substantially greater than the F-critical value of 4.53. Therefore, the regression equation is useful in predicting the assessed value of office buildings in this area.

Example 5   Calculating The T-Statistics

Another hypothesis test will determine whether each slope coefficient is useful in estimating the assessed value of an office building in example 3. For example, to test the age coefficient for statistical significance, divide -234.24 (age slope coefficient) by 13.268 (the estimated standard error of age coefficients in cell A15). The following is the t-observed value:

t = m4 ÷ se4 = -234.24 ÷ 13.268 = -17.7

If you consult a table in a statistics manual, you will find that t-critical, single tail, with 6 degrees of freedom and Alpha = 0.05 is 1.94. Because the absolute value of t, 17.7, is greater than 1.94, age is an important variable when estimating the assessed value of an office building. Each of the other independent variables can be tested for statistical significance in a similar manner. The following are the t-observed values for each of the independent variables.

|Variable |t-observed value |

|Floor space |5.1 |

|Number of offices |31.3 |

|Number of entrances |4.8 |

|Age |17.7 |

These values all have an absolute value greater than 1.94; therefore, all the variables used in the regression equation are useful in predicting the assessed value of office buildings in this area.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download