LINEAR REGRESSION:



LINEAR REGRESSION:

One of the most common methods of evaluating data, e.g., calibration curve data, is to graph the points and draw a straight line through them. Often the points have some degree of scatter and do not perfectly fit a straight line so the student must exercise judgment in determining the best straight-line fit.

The method of least-squares (linear regression) is completely objective and can be performed easily in Excel. Recall the equation of a straight line is y = mx + b, where m is the slope and b is the y-intercept. For example, x-values may be molar concentrations and y-values may be absorbance readings from a spectrophotometric calibration curve.

|KMnO4 Calibration Curve using SLOPE, INTERCEPT, RSQ, STEYX & TREND Functions |

|X values |Y observed |Y calc. |Y(obs -calc) |% deviation | |new y's |calc. x's |

|[M] |Abs |Abs (calc) | | | |sple. Abs |sple. calc. [M] |

|0.000E+00 |0 |0.002 |-0.002 |-100 | |0.2 |7.96E-05 |

|1.029E-04 |0.257 |0.258 |-0.001 |-0.310 | |0.4 |1.60E-04 |

|2.058E-04 |0.518 |0.513 |0.005 |0.896 | |0.6 |2.41E-04 |

|3.087E-04 |0.771 |0.769 |0.002 |0.260 | |0.8 |3.21E-04 |

|4.116E-04 |1.021 |1.025 |-0.004 |-0.351 | |1.2 |4.82E-04 |

| | | | | | | | |

|slope = |2483.965 | |std. error of est. |0.003812261 | | | |

|intercept = |0.0022 | | | | | | |

|R2 = |0.9999333 | | | | | | |

R2 measures the goodness of fit of data to the regression equation. A perfect fit has R2 = 1.0 while the worst possible fit, i.e., completely scattered data, has R2 = 0. For established analytical methods such as spectrophotometry, specific ion, pH, you should obtain an R2 value of at least ‘four 9’s’, i.e., ( 0.9999.

The SLOPE, INTERCEPT, RSQ, STEYX and TREND Functions:

The slope, y-intercept, R2 value (RSQ), standard error (STEYX) and data interpolation (TREND) of an array of (x,y) data points are obtained as follows:

SLOPE:

1. Select a blank cell where the slope value is to appear. Click in the Formula Entry box, then, on the toolbar click Insert, Function, Statistical, SLOPE, OK.

2. In the Slope Dialog Box that appears, click the ‘go to worksheet’ icon in the known_y’s box, select the y-value cells (absorbance values), Enter.

3. In the known_x’s box click the ‘go to worksheet’ icon, select the x-value cells (conc. values), Enter, OK. The slope is displayed in the cell where the formula was written.

[pic]

INTECEPT, RSQ, STEYX and TREND Functions:

Repeat the process for the intercept using INTERCEPT(known_y’s, known_x’s)

Repeat the process for R2 value using RSQ(known_y’s, known_x’s)

Repeat the process for the standard error of y-estimates using STEYX(known_y’s, known_x’s)

TREND Function:

‘Trend’ allows you to calculate unknown values of linear functions by interpolating the data mathematically, i.e., using the regression equation to calculate unknown values. This is more accurate than trying to visually read a graph whether it is hand-drawn or computer-generated, e.g., enter a set of sample absorbances and calculate the sample concentrations. (Always include ( 2 standard absorbances as a check). Here’s how:

1. Select a range of empty cells opposite the sample absorbance values (new_y’s), then click the cursor in the formula bar, then click Insert, Function, Statistical, TREND, OK.

2. In the Trend dialog box you’ll enter the cell addresses requested but wherever y-values are requested insert x-values instead. Be careful here. i.e.,

use TREND(known_x’s, known_y’s, new_y’s) (This calculates conc., x from Abs., y)

instead of TREND(known_y’s, known_x’s, new_x’s). (This calculates Abs., y from conc., x)

3. Finally, CTRL+SHIFT+ENTER (instead of just ENTER or OK since you want an array of results). If, by mistake, you just press ENTER, press F2 (edit) and press CTRL+SHIFT+ENTER.

CHECK YOUR DATA FOR FLYERS (Outliers):

Calculate y-values (absorbance values) based on your regression equation and compare these with the corresponding observed y-values. If only one of the calculated y-values differs greatly from the observed value, it is likely that an error was made in preparing/reading the sample. If possible, the student should repeat this measurement or delete this point from the data set and repeat the regression analysis.

1. In one of the cells where a y-calc. value is to appear, type: =m*x+b e.g., =$C$10*B4+$C$11, where absolute cell references, e.g., $C$10 and $C$11 are used to get the values of m and b, but relative referencing, e.g., B4, is used to get a value of x. This formula is then copied (by dragging) to other cells to get other values of x-calc.

Alternately, use the Trend function and x-obs. values to generate y-calc. values

2. Now calculate the deviations (yobs-ycalc) and % deviations to identify outliers.

Number Format: Number format can be interchanged between scientific notation, common notation and others as follows: Select the cells, click Format, Cells, Number tab, Number; also select # of decimal places, then OK.

PLOTTING YOUR DATA:

Plot both your observed data and your calculated regression line. This will give a visual indication of goodness of fit of your data. To plot both data sets proceed as follows:

1. In the spreadsheet, drag to select the x-values and both sets of y-values. Note that Excel automatically chooses the left-most column of selected data as the x-values. Then click the Chart Wizard icon.

2. In the Chart Type Dialog box, on the Standard Types tab, click XY(Scatter). On the Chart sub-type section, click the various types until you see the description ‘Scatter with data points connected by lines’, then Next.

3. On the Data Range tab, note that ‘Series in Columns’ should be selected. Click the Series tab and with Series 1 selected, in the Name box, type in a name to appear in the legend for series 1 data, e.g., ‘y-obs.’ Verify that the x-values are correct and y-values correspond to the y-observed values on the spreadsheet. Next select Series 2 and name it, e.g., ‘y-calc’, then Next

4. On step 3 of Chart Wizard, on the Titles tab, type in a name for the graph. Don’t worry about the font size or subscripts. They can be adjusted later. Then tab or click to enter labels for the X-axis and Y-axis. On the Gridlines tab, you may add more gridlines if desired. Then click Next.

5. On step 4 of Chart Wizard, you may choose to have the graph displayed on a new (separate) sheet or ‘as an object in’ the same sheet. Then click Finish.

6. To obtain a white rather than gray background (save your printer some ink), place the cursor on the gray area (do not allow the cursor arrow to touch any other line; just gray space). You should see a pop-up menu that says ‘Plot Area’. Right click the Plot Area then left click ‘Clear’.

7. Note that the two lines are superimposed and as such are not useful for examining goodness of fit or identifying flyers. You will now change the format of the observed data to points only and change the format of the regression line to a line only so that the goodness of fit will be more apparent. Proceed as follows:

8. Move the mouse pointer over the lines on the graph and read the pop-up menu to see which data set is indicated (e.g., Series “y-obs”). With the chart selected, you can use any of the cursor move arrows to cycle through the chart components. Read their names in the Formula bar. Also note the blue box Excel draws around the selected y-values. Be sure that the observed y-values are selected, then left click on that data set (if not already selected). Click Format, Selected Data Series. In the Format Data Series dialog box that appears, on the Patterns tab, choose line, none and marker, automatic, then OK. This will show only the (x,y-obs.) data points and not a connecting line.

9. Move the mouse pointer over the remaining line and note that the pop-up menu identifies it as Series “y-calc”. Left click this series and then Format, Selected Data Series. Then on the Patterns tab, select line, automatic and marker, none, then OK. This will show only the (x,y-calc) line without its points.

10. Click the edge of the graph to deselect the lines and points; then examine the fit of y-obs. points to the y-calc. line. You may decide to repeat or delete flyers based the fit you observe.

11. Additional formatting can be done. For example, click the title ‘KMnO4 Calibration Curve’. To change the ‘4’ to subscript, select the character, ‘4’, by dragging the I-beam over it, then Format, Selected Chart Title. On the Font screen Effects area, select ‘subscript’, OK. The title will change to ‘KMnO4 Calibration Curve’. The font size of axes labels can be changed in similar fashion. The font size and number of decimal places of the axes values can be changed after selecting the axes, then click Format, Selected axis and use the Font and Number tabs to access these controls.

12. Click the Drawing Tools icon to Insert a Text Box and in it type important regression statistics..

[pic]

The LINEST Function:

LINEST (short for LINear ESTimation) function performs linear regression analyses on a set of (x,y) data points. The general form of the linear equation that can be handled by LINEST is

y = m1x1 + m2x2 + m3x3 + … + b

LINEST returns the array of regression parameters mn …m2, m1, b

The syntax is LINEST(known_ys, known_xs, const_logical, stats_logical)

If const_logical is TRUE, or 1, or omitted, the intercept b is given.

If const_logical is FALSE, or 0, the intercept b is not given.

If stats_logical is TRUE, or 1, an array of regression statistics are given in addition to values of m and b.

|KMnO4 calibration curve using Linest | | |

|X values |Y observed | | | | |

|[M] |Abs | | |m |b |

|0.000E+00 |0 | |parameters |2484.0 |0.0022 |

|1.029E-04 |0.257 | |std. dev's |11.7 |0.00295 |

|2.058E-04 |0.518 | |R2, SE(y) |0.999933 |0.003812261 |

|3.087E-04 |0.771 | |F, df |44952.8 |3 |

|4.116E-04 |1.021 | |ss(reg), ss(resid) |0.653 |4.36E-05 |

Applying LINEST to the spectrophotometric data for KMnO4 follows:

1. Select an array of cells (5R ( 2C). Click in the formula box, then Insert, Function, Statistical, LINEST, OK.

2. In the LINEST dialog box for known_ys, click the go to worksheet icon and select the known y values, Enter.

3. In the LINEST dialog box for known_xs, click the go to worksheet icon and select the known x values, Enter.

4. In the LINEST dialog box for Const, type 1, then tab to the Stats box and type 1, then do not click the enter button but rather type CTRL+SHIFT+ENTER. If, by mistake, you hit only ENTER, type F2 (formula edit), then CTRL + SHIFT + ENTER.

[pic]

SE(y) is short for ‘Standard Error’ of the y-values.

Explain the meaning of SE(y), i.e., how is it derived?

How can you use it?

REGRESSION USING DATA ANALYSIS IN TOOL PAK:

1. Arrange the data in columns with the x variable on the left and the y variable on the right. Make space for the results of the regression analysis to the right of the data. Allow a 16R(7C array of space.

2. Choose Data Analysis from the Tools menu; if the Data Analysis command is not present in the Tools menu, you must use the Add-Ins command in the Tools menu to install it.

3. After choosing Data Analysis, choose Regression, OK from the Analysis Tools list box. The Regression dialog box will prompt you to enter the range of dependent variables (y). Enter your data label along with the data.

4. Tab to the ‘Input X_Range box and enter the range of independent variables (x) along with its label.

5. Select the Labels box because we have included the labels with the data ranges

6. Select ‘Constant is zero’ only if you want to force the regression line through the origin (0,0).

7. Confidence Level: Excel automatically includes 95% confidence intervals for the regression coefficients.

8. Output location: Click the Output Range button. Click to select the range edit box on its right, and point to or type a reference for the top left corner cell of the array where the output summary will appear.

9. Residuals: Select this box to obtain the fitted values (predicted y) and residuals.

10. Residual Plots: Select this box to obtain charts of residuals versus each x-value.

11. Standardized Residuals: Select this box to obtain standardized residuals (each residual divided by the standard deviation of the residuals). This output makes it easy to identify outliers.

12. Line Fit Plots: Select this box to obtain and XY(Scatter) chart of markers for the (x,y) data points and a line showing predicted (x,y) values, i.e., the regression line.

13. Normal Probability: Do not check this option.

14. Finally click OK The summary chart appears.

15. To change column widths so that all summary info is visible, make nonadjacent selections of all cells that are too small. From the Format menu choose Column, Auto Fit Selection.

16. Three tables are produced: regression statistics, analysis of variance (labeled ANOVA), and regression coefficients. The analysis of variance data can be deleted as follows: Select the cells, the Edit (or right click), Delete, Shift Cells Up, OK.

17. Residuals data may take extra space here and may have to be moved opposite raw data.

Comparison of LINEST and Data Analysis:

• Data Analysis regression provides more complete statistical data and the output is clearly labeled.

• The advantage of LINEST is that it is dynamically linked to the original data and is automatically updated if the raw data is changed.

[pic]

The ‘Line Fit Plot’ is simply a graph of the data. Both predicted and observed data appear as points.

Change the predicted values to a trend line without markers to see how well the data fits the trend line.

Residual plots are better indicators of data/trend line fit. The residual chart is located to the right of the regression summary output. If the relationship between concentration and absorbance is linear, then random pattern should appear in the residuals plot. On the other hand, if we see curvature or some other systematic pattern, then we should change our model to incorporate the nonlinear relationship. We will study nonlinear regression next.

Residual plots are also helpful in identifying situations where the residuals are smaller in one region and larger in another. The residuals plot would have the shape of a tree resting on its side. In such cases the standard error of the estimate, which summarizes all of the residuals terms, would overestimate the calculated concentration in one region and underestimate the concentration in another.

[pic]

Residuals are determined as (yobserved-ycalculated). Both the residuals plot and the residuals output data show that the calculated y-values are low in the middle concentrations and higher at the extremes. However the relative deviation, as we have already calculated, is quite small (in the order of 0.3 to 1.0% deviation).

Exercise: Apply Data Analysis to the KMnO4 calibration data. Obtain the same results as shown above.

LINEAR REGRESSION by ‘Adding a TRENDLINE’:

For ‘Add a Trendline’ function, you plot a graph first and then get the regression equation from the graph.

1. Enter the x and y values, e.g., conc. and absorbance, on an Excel spreadsheet, putting the y-values in a column to the right of the column of x-values.

2. Drag to select the entire array of x and y values.

3. Click the Chart Wizard icon on the toolbar.

4. Step 1 of the Wizard: Under Standard type choose XY(Scatter). Under Chart sub-type select ‘Scatter, compares pairs of values’; then Next. (Do not use a line joining data points.)

5. Step 2: On the Data Range tab, ensure that ‘series in columns’ in selected and under the Series tab type in a name, e.g., Data; then Next.

6. Step 3: Under the Titles tab, type a chart title and X axis and Y-axis labels. Under the Gridlines tab, add x-axis major gridlines; then Next.

7. Step 4: Click to place the chart as an object in the spreadsheet or on a new sheet, Finish.

8. Clear the background by right clicking on the gray area and click Clear. Using the cursor click on a data point, click Chart, Add Trendline. Under the Type tab, in the Trendline dialog box choose the Linear Trend/Regression type. Also under the Options tab, select ‘Display equation of chart’ and ‘Display R-squared value on chart, then click OK. The R2 value and regression equation are displayed and a line corresponding to the regression equation is drawn on the chart. Make the trend line very thin by first selecting it, then choose Format, Selected Trendline, and change the ‘weight’ on the patterns tab to hairline (very thin), OK.

9. IMPORTANT: Increase the number of decimal points in the regression equation and R2-value by clicking to select this data and repeatedly clicking the ‘increase decimal points’ icon. This change cannot be made after any other reformatting of this data, such as changing font size.

10. Change the name of the trend line as follows: Click to select the trend line, then click Format, Selected Trendline. Under the Options tab change Trendline name from automatic to custom and type in ‘Regression Line’, OK. Make other formatting changes as desired, e.g., font size of X axes values, etc.

11. You now have a graph and an equation of the line of best fit. Use this equation to calculate any unknown x-values, e.g., conc. of samples, from measured y-values, e.g., absorbance of samples. Do the calculation by writing the appropriate formula in an empty spreadsheet cell and drag to other cells to automatically obtain all results.

What formula will you write in an empty cell to complete step 11? Do this now.

[pic]

PRINTING YOUR GRAPH:

1. Printing from Excel: In step 4 of the Chart Wizard, you may opt to ‘place your chart as an object in’ the current spreadsheet or to ‘place the chart as a new sheet’. You can print the graph from either location. If the chart is placed in the current spreadsheet, select it first (sizing handles appear when selected), then choose File, Print. It is always good practice to Print Preview before activating the Print command.

2. Printing from MS Word (or other programs): In Excel, Select, then Copy the Excel graph to the clipboard using the copy icon or the Edit, Copy command (or CTRL + C).

Open MSWord. In MS Word, place the cursor where you want the graph to be inserted. When inserting between existing lines of text, you do not have to create blank spaces; Word will do this for you. Then use the Edit, Paste Special command. In the Paste Special dialog box choose either ‘Paste’ or ‘Paste Link’ and choose ‘Microsoft Excel Chart Object’, then OK. Using the Paste Special command pastes the chart as an object making it easier to move and center on the page compared to simply Pasting from the clipboard (CTRL + V). The Paste Link option allows your graph to automatically update in MSWord if you later make changes to the Excel data or graph, however, both the Excel and Word files must be located in the same place (disk or drive) as they were when the Paste Link was created.

3. Print the graph using the program’s Print command.

-----------------------

This section moved from right to fit page.

|Upper 95% |Lower 95.0% |Upper 95.0% |

|0.01159766 |-0.00719766 |0.01159766 |

|2521.24955 |2446.680479 |2521.24955 |

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download