Regress™ User’s Guide



[pic] Regress™ User’s Guide

Jeffrey H. Moore

Graduate School of Business, Stanford University

Introduction

A statistical regression add-in to Microsoft Excel on Macintosh and Windows computers, Regress[1] makes exploratory regression analysis fast and convenient. It does this by displaying preliminary results in a temporary window that you can easily dismiss during the exploratory modeling phase. This allows you to run many regression models, interactively adding and dropping variables, as you evaluate alternative models for reasonableness and goodness-of-fit.

Since Regress is an add-in to Excel, you have all the normal Excel worksheet capabilities available. This includes, for example, the ability to cut and paste the results into another Excel worksheet, Microsoft Word document, or other Windows or Macintosh compatible application. Moreover, you have complete freedom to change the formatting, the fonts, the size of tables and charts, etc., to suit your particular requirements. This flexibility is rarely present in other dedicated regression packages.

I Tutorial

Getting Started with Regress

To begin, launch Excel as usual.[2] Regress should appear as the last item on the "Tools" menu. See the figure below, left. If you do not see “Regress” listed, it has not been properly installed on your computer (or, if you purchased Regress, you neglected to re-start Excel after installing it on your computer.)

Now, click the “Regress” item on the Tools menu, causing Regress to be loaded into Excel as an Add-In. During the load process, you should see three changes:

(1) Excel will ask if you wish to allow a macro add-in to be opened. Click “Enable Macros” to begin. [pic]

(2) A herald window will remind you that you need to open a workbook containing your data. (See the figure below, right.)

(3) Finally, a small toolbar entitled “Regress,” containing three buttons, will appear in the upper right corner of your screen. (See the small figure below.)

[pic]

When the herald window closes, loading of Regress is complete. Regress will remain loaded into Excel and available until you quit Regress by clicking the “Q” button. Normal spreadsheet functions are unaffected by Regress, and you can use Excel as usual. Since its only function is to initiate loading, the “Regress” item on the Tools menu is removed after the Regress Toolbar appears. Thereafter, all regression modeling is done by clicking each of the three Regress Toolbar buttons, the middle one of which is not yet defined.

The Regress Toolbar “floats” above all Excel windows, and as with any Excel toolbar, you may move it out of the way or dock it by click-dragging its title bar.[3]

A sheet within a workbook containing the data for your regression modeling is called a Regress “datasheet” in this User’s Guide. The topmost row in any Regress datasheet is a special row containing each variable’s name at the top of its column. The actual (numerical-only!) data should begin immediately below the variable names and occupy as many rows as there are cases. (In a database, data set rows are usually called “records;” in data analysis they are frequently called “cases.”) Important: Data must be in columns, each with a variable name at the top, and a missing datum, if any, in a column must be represented as a completely empty cell.

Now, using the standard Excel “File Open” dialog in the File menu, open the MY_DATA.XLS example datasheet, a copy of which should be on your hard disk or on a network disk. (Purchasers of Regress may copy the file from the installation disk.) This datasheet contains the example data which should appear as in the figure below.[4] The first MY_DATA variable is entitled “No.” and is nothing more than a numeric row label for each of the case numbers. The actual variables used in the Tutorial are named My_Var_1, Y, and Age.[5]

[pic]

Important: Each column variable in a Regress datasheet must be Range Named by Excel. Range names attach a range of worksheet cell addresses to a name, and Regress uses this approach to map the variable name you select to its data. Range names are saved with the datasheet, and so, range naming the data need only be done once. Since Regress accesses the data only through range names, you are free to have non-range-named information (descriptive labels or text, formulas, charts, XY scatterplots, extraneous data, etc.) present elsewhere on your datasheet.

The variables of MY_DATA have already been range named. However, the easiest way to Range-Name a new sheet is to highlight the entire block of the data, including the variable names at the top of the columns, and then select the “Name Create...” item from the Insert menu. Clicking “OK” causes Excel to define each column as a separate range named variable. For example, MY_DATA was range named by highlighting cells A1 through D11, followed by selecting the “Name Create...” menu item from the Insert menu, and then clicking “OK.”

Your first task is to verify MY_DATA’s range names by selecting "Name Define..." from the Insert menu, as below.

[pic]

As seen in the Define Name dialog box below, you can verify the range for any variable by clicking its name and noticing the range of cells to which that name refers. For example, the name “My_Var_1” has been assigned the range of cells B2 through B11. (Always verify that the variables in your datasheet have the proper ranges associated with them. Otherwise, Regress may not process all your data!) After reviewing the variable ranges, click “OK” to close the Define Name dialog box.

At this point, you should have the Regress Toolbar showing on your screen and have your properly ranged-named datasheet, MY_DATA, opened in Excel. For simplicity, there should be no other workbooks open except your datasheet.[6]

Now you are ready to begin exploratory regressions.

[pic]

[pic]

Exploratory Regression Analysis

Be sure your datasheet is open, and click the left “Regression” button [pic] on the Regress Toolbar.

Clicking the left button on the Regress Toolbar always displays the Exploratory Regression dialog box, as below. Displayed on the left is an alphabetic list of all the range-named variables in your open datasheet.[7]

[pic]

The entry at the top right is the dependent variable you wish to predict. The dialog defaults to the first variable in the variable list. It is a drop-down list and you may select another variable to be your dependent variable. Select the variable named “Y” as the dependent variable, as shown below.

[pic]

Now you are ready to select the independent (predictor) variables used to predict the dependent variable, Y. You do this by selecting one or more variables from the list on the left. The variable(s) that you select will be included in either a simple regression model (selecting one independent variable) or a multiple regression model (selecting more than one independent variable). For now, select the variable Age as the single independent variable you will use to predict values of Y. See below.

[pic]

Next, click the [pic] button to cause the (simple) regression model to be fit to the data. Regress will open a new Excel workbook called “Exploratory Results,” shown below. By default, Exploratory Results displays the names of the dependent and independent variable(s) and two summaries, the “Regression Statistics” and the “Summary Table.”[8]

The Regression Statistics gives information about your model’s goodness of fit (signaled by the Adjusted R-square, “Adj.RSqr”), the Standard Error of the Regression, “Std.Err.,”[9] the number of cases used, “# Cases,” and number of cases dropped because of missing data, “#Missing,” if any.

The Summary Table gives the coefficients of the regression model, “Coeff.,” along with their standard errors, “Std.Err.,” and the resulting statistics, “t-Stat.” and “P-value,” used for hypothesis tests on the coefficients.

[pic]

At the top of the Exploratory Results window are two buttons that you may click for additional information:

Help gives a brief summary of your options.

Correlation presents the correlation matrix among all the pairs of variables in your model.

Because it is just another worksheet, you are free to open new windows and do other work even though the Exploratory Results window is open. Also, you can save Exploratory Results to disk or print it, if you wish. However, these are not its intended functions. Rather, Exploratory Results presents preliminary information about your regression model that you will either dismiss immediately in favor of another model or keep in a more detailed report worksheet.

[pic]

Keeping a Regression Model

When the Exploratory Results window appears, a change occurs in the Regress Toolbar buttons. All three buttons become defined, as shown on the right.

Clicking each of the Regress Toolbar buttons produces a different action, as documented below.

[pic] REGRESSION: Returns to the Exploratory Regression dialog for another round of modeling with this datasheet.

[pic] REPORT: Keeps the Exploratory Results model as a new worksheet report within the datasheet’s workbook.

[pic] STOP: Stops regression modeling with this datasheet.

Let’s assume that you want to keep this regression model. So, click the middle “Report” button. This produces the “Regression Results” dialog box, as shown below, containing many options for things to compute and keep in your report.

[pic]

The first entry in the Regression Results dialog box is the name you wish to give for the report worksheet of your model. Regress always proposes to call your first model “Regression #1.” You can leave this name as it is or substitute a more meaningful name of your own choosing.[10].

Change the dialog box’s Regression Name to be “My Regression Model.” You may now select the tables and plots that you would like to save, including if you wish, the Exploratory Results information Regress previously calculated. So, click the Descriptive Statistics option. This will produce another dialog box listing some standard statistics for your variables, as below. These statistics are not regression model statistics, but are computed from your raw data for possible comparisons later with your regression model results.[11] Accept these defaults by clicking the “OK” button, returning you to the “Regression Results” dialog box.

[pic]

Now, click the boxes to keep the items shown in the dialog below. You need not collect all this information. Of course, you can always check everything to get as much information on the regression model as possible. However, that the more things you check (especially plots) the longer it will take Regress to prepare your report and the more random access memory will be taken to hold your workbook.

[pic]

Leave the items under “Advanced Options” unchecked. They will be covered later. Now click the OK button. As you can see below, Regress will create a new worksheet with your Regression Name assigned to it containing all the tables and plots, and place it into the workbook containing your datasheet.[12]

At the far left of your worksheet report are small (+,-) symbol buttons of an Excel outline. These buttons allow you to collapse and expand portions of your results so that you can manage the view of information detail on your display screen. By default, the Regression Statistics and Summary Table are expanded, as signaled by the buttons with “-” signs. A button with a “-” sign has no more detail to display; a button with a “+” sign conceals information. You can reveal, i.e., open, the concealed information by clicking any button having a “+” sign, thereby expanding that portion of the worksheet.

[pic]

For example, click the “+” outline button to expand the Descriptive Statistics, as shown below.

[pic]

Clicking a “-” sign outline button collapses information. For example, if you click the “-” sign outline button next to the Summary Table, the information subtended by the vertical line above it will be concealed. Remember, the information is not erased; it is simply hidden from view.

Expand the Histogram of Residuals, as shown below.[13] Two histograms are presented; the darker (red) bars are the actual residuals from the model, and the lighter (blue) bars give the height of the theoretical bell-shaped normal curve that you would expect from normally distributed residuals. This allows you to compare visually the shape of the theoretical and the actual frequency histograms to see if the results violate the “normal distribution of residuals” assumption for a regression model.[14]

[pic]

Next, collapse the Histogram of Residuals and expand the Line Fit Plot., as shown below.

[pic]

[pic]

Whenever you define a “simple” regression model, the Line Fit Plot shows the Predicted-Y verses Age regression line itself with the scatter of Actual-Y values around it. (This plot is not possible for models with more than one predictor variable.) The vertical distance between the Actual-Y values and the regression line is the “Residual” or “Error” value for that case, given your current regression model.

Now, collapse the Line Fit Plot and expand the Residual Plots. Notice that two-level outlining is used advantageously for nesting the Residual Plots. You can expand and collapse each of these residual scatter plots by clicking its outline button or collapse all of them as a group.[15]

Click the outline buttons to expand the “Residuals -vs- Predicted Plot,” and the “Age Residual Plot,” as shown below.

[pic]

[pic]

Don’t forget that every Regress produced report is an entry in your datasheet’s workbook and is an Excel worksheet with its gridlines and row/column entries turned off. You can change fonts and number formats, copy/paste, annotate, etc., at will.

This completes your first regression model. Now is a good time to save your workbook. So, select the “Save” or “Save As” item from the File menu. Excel will prompt to convert MYDATA.XLS to workbook format during the Save operation. If you do not convert it to a workbook, Excel will save only the active worksheet and not the entire workbook.

How do you go back to create other regression models from this datasheet using different variables? You do this at any time by clicking the left button on the Regress Toolbar. At this point, you have several other options, however.

Clicking the middle “Report” button again will give you the opportunity to collect additional information from the previous model that you might not have checked earlier. To illustrate this, click the Report button. Note the same Regression Results dialog box appears. Regress always remembers your last actions.

If you click OK at this point, you will get a second report that is identical to the first one you kept. Since they would be redundant, un-check most of the tables and plots that you checked previously. Now, instead, check the boxes to keep the Line Fit Plot, Add Confidence Intervals, and Predicted and Residual Matrix, as below. When you check the Predicted and Residual Matrix option, the “Leverage Values” option un-gray’s to allow its selection. Leave this box un-checked for now.

[pic]

Click the “OK” button. Regress will notice that you did not change your Regression Name and will complain, as below:

[pic]

Since you are asking for more information on the same regression model, normally, you would “Append” the new information to the previous report worksheet. However, for the sake of illustration, let’s keep this second report in a separate worksheet. So, click the third option (“Save results instead to:”) to place the report into a second worksheet whose name defaults to an incremented version of the previous name, which you can override, if you wish, with another name. For now, accept the new default name by clicking the OK button.

Regress opens a new worksheet using the name of “My Regression Model #2.” Contained in it is a new Statistics and Summary Table, Line Fit Plot, and the Residual Table. Whenever the advanced option “Add Confidence Intervals” is checked, Regress also produces three additional statistics in the Statistics and Summary Table: (1) the regression model’s degrees of freedom, “Deg.Free,” and using the degrees of freedom and the default 95% option, (2) the “back-of-statistics-book” t-table value for a two tail, (1-95%)/2 = 2.5%, hypothesis test, “t(2.5%, 8).” In addition, (3) the 95% confidence intervals are presented for each regression coefficient.

[pic]

Next, expand the Line Fit plot. Whenever you define a one-predictor regression model and check the “Add Confidence Intervals” option, Regress also adds the 95% confidence interval lines for forecasting the range of individual population values of Y given Age.[16]

[pic]

In all appropriate tables and plots, such as above, you may, of course, override the 95% default Regress uses by changing the “95%” entry in the earlier dialog box, repeated below.

[pic]

Next, expand the Residual Table, as shown below. The column labeled “Actual” gives the values of the dependent variable, Y, originally given in the datasheet. Next are the “Predicted” values of Y, calculated from the regression model equation for the values of your independent variable(s). These Actual and Predicted Y values were plotted earlier on the Line Fit Plot. In addition, the difference between the two for each case is given, called the “Residual.” The “% Error” is the ratio of the Residual to the Actual for each case, ignoring the sign of the Residual. The bottom line “Average Predictive Error” is the average of the % Error column. Finally, the residuals are “standardized” by dividing each case’s Residual by the Standard Deviation of the Residuals to get a measure of variability that is dimension-less. These are the Standardized Residuals, “Std.Res.” (Note: the “Standard Error of the Regression,” given earlier in the Regression Statistics Table as “Std.Err.”, and the “Standard Deviation of the Residuals” are synonymous.)

[pic]

[pic]

The figure to the right shows an alternative way to select among any windows that are open by using the Window menu. Notice that your workbook is the only (un-hidden) window open at this point. The two kept model report worksheets, or more precisely, two worksheets of information about the same regression model, are sheets inside the workbook. Also, note that the “Unhide...” option on the Window menu is not grayed out. Regress has hidden the Exploratory Results worksheet to speed computation by eliminating the time to update its screen image.

Let’s try a different regression model on the same datasheet. So, “dismiss” the current model by pressing the left Regress Toolbar button, thereby returning to the Exploratory Regression dialog box, as shown below. Regress preserves the original selection of variables, and you can now choose a different dependent variable or a new subset of independent variables.

[pic]

This time, select “Age” and “My_Var_1” as two predictor variables. You select several independent variables by holding down the Ctrl key (Windows), or Command key (Macintosh), and clicking each variable name. Now, click the Regress button to start the process again. The Exploratory Results window will be un-hidden and updated to present the results of the new (multiple) regression model, as shown below.

[pic]

At this point, let’s compare this two-predictor-variable multiple regression model with the simple one-predictor-variable regression previously saved in your workbook. Since all Regress produced information is in Excel worksheets, you can display multiple windows at any time. So, click the “My Regression Model” tab in the workbook to compare visually the previous model’s results with the new ones in Exploratory Results.

The figure below shows the result of going back to the workbook and comparing the old model with the new one. If your display screen is large enough, you can move the workbook window below or to the side of Exploratory Results to compare continuously new exploratory models against earlier results.

[pic]

Now, go back to Exploratory Results by clicking on its window title bar (or using the Window menu). Let’s keep a report for the two predictor variable model. So, click the middle “Report” button on the Regress Toolbar to get the Regression Results dialog box with a proposed name for your, in this case, third worksheet report. Since this is an entirely new model, you should override the proposed name. Give the model a more meaningful name, “Y vs. Age and My_Var_1,” and select the items to keep, as shown in the figure below.

[pic]

Click “OK” to get the worksheet report for this new model, as shown below. Notice that the “Y vs. Age and My_Var_1” model has become another sheet in your workbook. You now have four sheets in the workbook (a datasheet and three regression reports), completely documenting all of your work in this modeling session for future perusal or printing.

[pic]

Expand the Correlation Matrix table to see the simple correlation between any two pairs of variables. This is useful for spotting any (first-order) collinearity among the independent (predictor) variables.

[pic]

Next, expand the Residual Table. Note the smaller Average Predictive Error percentage produced by the higher R-Square results of this two variable regression model.

[pic]

Next, expand the Line Fit Plot. Notice that it has changed from that of the simple regression model shown earlier. Since, the regression “line” cannot be plotted for regression models with more than one predictor variable on a two dimensional plot, the Line Fit Plot changes to plotting the Actual-Y and Predicted-Y (as calculated in the Residual Table above) for each of the cases.

[pic]

Stopping Regress Modeling

At this point, let’s discontinue regression modeling with the “MY_DATA” datasheet. To stop, click the right (“Stop”) button on the Regress Toolbar: [pic]. The figure below shows the result of clicking the Stop button. The (hidden) Exploratory Results window has been closed and all reports are contained as sheets in the original MY_DATA workbook. This allows you to keep a complete diary of every report you kept along with the datasheet that produced those results, a great convenience for reviewing or auditing your work at a later time.

[pic]

You may select the “Move or Copy sheet...” item in the Edit menu, as shown below, if you prefer to keep your datasheet or any particular worksheet report in a separate workbook.

[pic] [pic]

Notice that after clicking “Stop,” the Regress Toolbar changes its appearance again. Since there is no exploratory model now, the middle “Report” button becomes blank. To resume regression modeling later in the Excel session, open a workbook, select a datasheet (with Range-Named variables!), and click the left “Regression” button on the Regress Toolbar.

The figure below shows the Select Datasheet dialog box that Regress may offer when you click the left button. In this example, two workbooks happen to be open simultaneously, and Regress is asking you which datasheet to use: the one you originally opened, MY_DATA.XLS, or another open one called “Companys.xls.” Optionally, you may click the Open button to open another previously saved workbook.

[pic]

If you were interested in the data from COMPANYS, you would select it and click the OK. Regress would then produce the Exploratory Regression dialog box for that datasheet. You can, therefore, have several datasheet workbooks open at once. However, during your modeling, Regress must then ask you to remove ambiguity by selecting the datasheet of interest. You can avoid this dialog by closing all but the one workbook of interest.

Quitting Regress

After stopping Regress, the right “Stop” button in the Regress Toolbar changes to a [pic] to allow you to completely “Quit” from the Regress application. Clicking the “Q” button frees primary memory by removing the Regress add-in software from the Excel workspace and deleting the Regress Toolbar. Quitting enables the Regress item in the Tools menu to allow re-loading Regress at a later time.

If you quit from Excel now, Excel will verify if the workbook has been saved recently to the hard disk and may offer you the usual “File Save” dialog in which you can give the workbook a more meaningful name. To be safe, you should periodically select the “Save” item under the File menu to keep a running backup of your results.

As shown below, the workbook, including all its worksheet entries, will be saved under the name “My_Work.” (Under Windows: My_Work.xls) Now you can quit Excel or open another workbook, at your option.

[pic]

Technical Limitations

Use of Regress within Excel is governed by two important limitations:

1. It is common to forget how many tables and charts you end up “keeping” into a workbook during a Regress modeling session. It is easy to accumulate ten or more regression model worksheet reports, each with a dozen charts or more. Ten worksheets and 120+ charts is a sizable chunk of primary memory for Excel to handle, especially for larger data sets. If you wait to get Excel’s “Out of Memory” message, it may be too late to save your workbook! This is especially true for Windows, as its internal User/GDI system resources memory is of fixed size and is used up quickly by the graphics even if you have lots of RAM or Windows swap file space on your hard disk. So, save your workbook often!

2. Regress is limited by restrictions within Excel to no more than 16 variables in any one regression model and to no more than 250 cases.

Conclusion

There are many more useful options available in Regress that have not been covered in this initial tutorial section. For example, you are always free at any time to perform non-linear regression modeling or to use the Chart Wizard to produce scatter plots for your raw data. These are only two examples of the almost unlimited flexibility Regress coupled with Excel offers in your regression modeling. The following three sections elaborate on these and other more advanced options in Regress, and additional tips and techniques, including non-linear regression modeling, that you will find useful. There is also an important fourth section on Problems and Error Messages that you will find useful.

This tutorial section has given you an introduction to exploratory regression analysis with Regress. As you see from this brief summary, the Excel base Regress builds upon makes it quite flexible and very convenient for developing your final regression model. Most likely, this spreadsheet foundation is Regress’ most endearing capability. Good luck with regression modeling and...

“May the High R-Square Be With You.”

II Advanced Regress Reporting

The Regression Results dialog allows several other options for you to add to the report for your model: Analysis of Variance (ANOVA) Table, Residual Statistics, Leverage Values, and Forecast with Regression, as checked below.

[pic]

Analysis of Variance

The ANOVA table summarizes the regression model equation into an F statistic. The P-value associated with this statistic allows test of the overall significance of the regression model apart from the significance, if any, of the equation's individual regression coefficients. For example, extreme multi-collinearity could produce a Regression Summary report with few or no significant regression coefficients, even though the overall regression model has high predictive value (high adjusted R-square statistic). High predictive value of the model would be signaled by an F statistic with low P-value. Loosely speaking, the P-value of the F statistic tests the overall adjusted R-square value, "Adj.RSqr," as being significantly different from zero, i.e., that at least one regression coefficient is significantly different from zero.

[pic]

Residual Statistics

Checking the Residual Statistics option in the Regression Results dialog, adds the following Residual Statistics table to worksheet report.

[pic]

The “#Res.>0” and “#Res. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download