USE OF NCSS (Number Cruncher Statistical Systems)



USE OF NCSS (Number Cruncher Statistical Systems)

Introduction. Although we use Microsoft Excel in General Chemistry for data analysis and curve fitting, a more sophisticated package is required in Chemistry 160. NCSS written by Dr. Jerry Hintze meets virtually all our needs and hence has been adopted as the standard. The package is also used by the Mathematics Department because it gives the correct result where many better known packages fail.

We shall only use a small part of NCSS’ features. The tools which we need are distributed among the modules and some of the calculations must be repeated in order to obtain all the desired information. Important special cases are discussed in this handout. Refer to the manuals or the extensive help files to handle special cases and questions.

Data Preparation. Start NCSS by clicking on the NCSS icon. You automatically open a new spreadsheet. You may enter your data as you would with an Excel spreadsheet and can also import data created by another spreadsheet such as Excel. In order to label the columns, click on the Variable Info tab and enter your variable names in the column marked Name. You can provide a more detailed entry in the column marked Label. No NOT label columns in the Excel manner by inserting names in the spreadsheet cells. Return to the spreadsheet by clicking on the Sheet1 tab. If you wish to use the Excel-style method of defining and replicating the contents of cells, note that for the purpose of calculations the internal designation of the columns rather than their names applies. That is, the first column is the A column; the second, the B; and so on. For example, suppose that the first five integers have been entered into rows 1-5 of the leftmost column whose name is X. To place the squares of these integers in the next column, click on the top cell in the second column, enter =A1^2 (not =X1^2), click on Copy in the File menu, select the next four cells in the second column, and finally click on Paste in the File menu. When you analyze a set of data, you must use all the numbers in a column. Hence, if you wish to use a subset of the data, create a new column containing the subset of data to be used. The column can be the same length as the column of all numbers if you duplicate the full column of data and blank out the cells containing the data which you do not wish to use.

Simple Linear Regression. The following procedure is used for the case of a simple linear regression. That is, a non-cross validated fit with an adjustable intercept and one independent variable and without weighting. In this case, one module provides all the required information including a graph. After preparing the spreadsheet, click on Curve Fitting in the Analysis menu and select the Growth and Other Models module. The NCSS template for the Growth module will appear. The template or window has a box for entering and selecting values and options at the top. Accept the default settings with the following changes:

1) Select the dependent variable by clicking on the line labeled Y Variable. The line containing its current, default value will turn blue. Enter the label of the column in the box at the top of the template. If you click on the arrow to the right of the box, you will be prompted for the options. Press the Enter key to execute the change.

2) Similarly select the independent variable and Model Type. A variety of set models in addition to y = mx + b is provided.

3) Select the regression report by clicking on the line labeled Show Results Report and enter Yes in the box at the top of the template. Similarly deselect the probability plots by changing the default for the two lines labeled Show Probability Plot from Yes to No.

4) If you want a graph of the data, select Function Plot. You can define the range of x and y values in the graph be entering values for Xmin, Xmax, Ymin, and Ymax. Also enter a caption for the graph in the Title field.

After you have prepared the template for the calculation, click on the Run button in the template and then on Run Procedure. The result will appear in the NCSS Output Window. You can print the result by clicking on Print in the File menu of the Output window.

A typical report generated by the Curve Fit module is given on the next two pages. The module uses a non-linear regression routine and a report of the progress to convergence is given at the top of the report. The relevant results for Chemistry 160 are given in the Model Estimation section of the report. This section provides in a format similar to that of Excel the following information for each parameter: its best-fit value, its standard deviation, and the 95% confidence interval. The value of R2 is also provided. The line labeled Error in the Analysis of Variance Table contains the degrees of freedom in the DF column and the square of the standard deviation of the residuals in the Mean Square column. Following this is the Correlation Matrix from which the covariance matrix can be generated. To obtain element i,j in the covariance matrix, multiply element i,j in the correlation matrix by the product of the standard errors for parameters i and j. For example the covariance of the intercept and the slope is generated by the product

(-0.904534)(0.37743)(0.11380); the variance of the slope, by the product (1)(0.11380)(0.11380). The second page contains the graph and information relating to the values of the dependent variable predicted from the regression equation: the predicted values of y, the confidence intervals of the predicted values, and the residuals (the deviations between the observed and predicted values).

The Curve Fit module has one drawback. It is based on a non-linear regression routine and occasionally poor results are generated by the default initial estimates of the parameters. If the module generates large confidence intervals or residuals from a dataset that appears to be acceptable, either adjust the initial estimates of the parameters or use the Multiple Linear Regression routine. The latter is bulletproof.

Multiple Linear Regression. Other cases including a weighted regression, more than one independent variable, cross-validated regressions, and a zero intercept require the use of the Multiple Regression module. Alas, this module does not generate a scatter plot and the full correlation matrix. These features may appear in future releases of the software. To access this module which generates an impressive amount of information, click on the Regression/Correlation item in the Analysis menu and select Multiple Regression. You will need to edit the NCSS template using the approach discussed above. Consider the following items:

1) Provide the identity of the dependent variable in the Y Variable field.

2) Provide the identity of the independent variables in the X Variable field. You have the option of selecting more than one column in the spreadsheet.

3) If you wish to weight the data and have defined a spreadsheet column containing the weights, select the Weight Variable.

4) If the intercept should be forced through zero, change the status of the Remove Intercept field from the default No value to Yes.

5) You will probably want to de-select most of the Plot options. None will provide a required scatter plot. If you require a graph, you will have to use one of the graph functions or the Growth module in NCSS. However, the graph generated may not correspond exactly to the equation generated by the Multiple Regression module. For example, if you correctly fit absorbance to concentration with a model A =(lc, the Multiple Regression module will handle the problem correctly and drive the line through the origin. All the other modules will fit the data to an equation of the form y = mx + b and the intercept might be small but will not be exactly zero.

6) In every case, select the following reports: Coefficient, ANOVA, and Residual.

7) If you have more than one independent variable, also select the reports for Correlation Matrix, R-Squared, Variable Omission, and Sequential Models.

8) Refer to the Help file for the other types of information which the versatile Multiple Regression module can provide.

When the template has been prepared, click on Run and then Run Procedure. The results will appear in the NCSS Output window.

The output is illustrated on the next two pages for the same linear case discussed above. If the Correlation Matrix report is selected, the covariance between the coefficients, i.e. between the parameters associated with the independent variables, can be calculated from the correlation matrix which is displayed. The correlation matrix is complete for the case in which the intercept is zero. In the case where the intercept is an adjustable parameter, the full correlation matrix is not given and the covariance(s) between the intercept and the coefficients(s) can not be calculated from the information provided. If you require the full correlation matrix including the matrix elements for the intercept, try the Nonlinear Regression module selected from the Regression options in the Analysis menu. There is no free lunch as the Nonlinear Regression routine is more difficult to use and does not handle weighting.

The Regression Coefficient section gives the best-fit values of the parameters, their standard deviations, and 95% confidence intervals. The column Standardized Coefficient is useful if there is more than one independent variables. The values give a measure of the relative contribution of each independent variable to the model. The last item in the section, T-Critical, is the value of the Student’s t for the degrees of freedom.

The Analysis of Variance section provides on the line labeled Model the F statistic used to determine whether there is a relationship and the probability that the apparent relationship could be generated randomly. The line labeled Error contains the degrees of freedom (DF) and the square of the standard deviation of the residuals for a non-cross validated fit (Mean Square). The standard deviation of the residuals for a non-cross validated fit (Root Mean Square Error) is given in the first line of the next set of numbers which also includes the non-cross validated R2 (R-Squared). The Press Value is the sum of the squares of the residuals for a cross validated fit. The Press R-Squared is Q2, R2 for a cross validated fit.

The second page of the illustrative report contains a table of the predicted values and the residuals. Please refer to the Help file for a discussion of the other potentially useful tables which can be generated by NCSS.

ncss.doc

2 January 2002, revised, WES

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download