An Introduction to Multivariate Polynomial Regression (MPR)



Bibliography

This is a list of references to work that has been done in the literature that is related to Multivariate Polynomial Regression (MPR), as well as other literature on topics such as nonlinear regression, or use of nonlinear models in process control.

MPR and PARX

S. Chen and Billings, S.A., Representations of non-linear systems: the NARMAX model. Int. J. Control, v49, n3, 1013-1032 (1989).

Introduces the NARMAX model. MPR and Volterra models are described as special cases of NARMAX. Also discusses affine, bilinear and rational models, but settles on polynomial form.

Billings, S.A., and S. Chen, Extended model set, global data and threshold model identification of severely non-linear systems. Int. J. Control, v50, n5, 1897-1923 (1989).

Applies NARMAX model (essentially same as PARX), gives apparently different procedure for selecting significant terms.

Li, C.J. and Y.C. Jeon, "Genetic Algorithm in Identifying Nonlinear Autoregressive with Exogenous Input Models for Nonlinear Systems", Proc. of the Am. Control Conference, San Francisco, CA, pp2305-2309 (June 1993).

Bard, Y. and L. Lapidus, "Nonlinear System Identification", Ind. Eng. Chem. Fundam., v9, n4, p628-633 (1970).

Restricted to Volterra model (lagged outputs not used as independant variables). Uses idea of selecting terms by checking correlation of residuals of current model with candidate terms. Nothing is new under the sun! Calls method "extremely promising". Tests it with 4-CSTRs-in-series.

NOTE: I have tested this idea and unfortunately found it seriously lacking. Using the retail outlet data I found that a significant variable might not correlate with residuals if a current term is correlated with the candidate term, yet contributes less to the model.

MPR-related publications by D. Vaccari

Vaccari, D.A. and J. Levri, “Multivariable Empirical Modeling of ALS Systems Using Polynomials,” Life Support and Biosphere Science, vol. 6 pp. 265-271, (1999).

"Correlations of Performance for Activated Sludge Using Multiple Regression with Autocorrelation", Christodoulatos, C., D.A. Vaccari, Water Research, v27, n1, pp51-62 (1993).

"Generalized Multiple Regression Techniques with Interaction and Nonlinearity for System Identification in Biological Treatment Processes", D.A. Vaccari and C. Christodoulatos, in Instrumentation Society of America Transactions, v31, n1, pp97-102 (1992).

“Nonlinear Analysis of Retail Performance”, D. A. Vaccari, IEEE/IAFE Conference on Computational Intelligence for Financial Engineering, New York, NY, March 24-26, 1996.

“Predicting Process Performance with Polynomials”, D.A. Vaccari and E. Wojciechowski, WEF Specialty Conf. “Automating to Improve Water Quality”, Minneapolis, MN, June 25-28, 1995.

“Nonlinear Control of a Wastewater Treatment Plant Using Multivariate Polynomial Models”, D.L. McMahon, S.L. Rivera and D.A. Vaccari, presented at the 1995 AIChE Annual Meeting, Miami Beach, FL.

Vaccari, D.A. and J. Levri, “Multivariable Empirical Modeling of ALS Systems Using Polynomials,” Life Support and Biosphere Science, vol. 6 pp. 265-271, (1999).

"Membrane Air Stripping Utilizing a Plate and Frame Configuration", Boswell, S.T. and D.A. Vaccari, presented at the 21st Annual Conference of the ASCE Water Resources Planning and Management Division, Denver, Colorado, May 1994.

"Generalized Multiple Regression Techniques with Interaction and Nonlinearity for System Identification in Biological Treatment Processes", D.A. Vaccari and C. Christodoulatos, in Instrumentation Society of America Transactions, v31, n1, pp97-102 (1992).

Vaccari, D.A., and L. Walden, “Polynomial Time-Series Modeling of Predator-Prey Population Dynamics”, presented at the Int’l. Soc. Of Ecological Modeling Conf., Baltimore MD (August 1998).

Wojciechowski, E., and D.A. Vaccari, “Techniques to Avoid Pitfalls in Empirical Modeling”, 29th Int’l Conf. On Environmental Systems, Denver, CO, July 12-15, 1999.

D.A. Vaccari, Zhaoyan Wang, J. Cavazzoni, K. Kumasaka, “Modeling CO2 Uptake In Soybean Plants Using Multivariate Polynomial Regression,” presented at the American Society for Gravitational Biology and Space Science Conf., Alexandria, VA, November, 2001.

Vaccari, D.A., and Z. Wang, “Computing confidence intervals for multivariate polynomial models of plant gas exchange,” presented at the NASA Bioastronautics PI Conference, Galveston, TX, January 2003.

Other Nonlinear Regression References

Vicino, A., R. Tempo, R. Genesio, and M. Milanese, Optimal Error and GMDH Predictors, A Comparison with Some Statistical Techniques, Int. J. of Forecasting 3 (1987) 313-328.

Compared ARMA, threshold AR, bilinear, GMDH (restricted polynomial) and their own "optimal error" methods.

Mathews, V.J., Adaptive Polynomial Filters. IEEE SP Magazine (July 1991).

Electrical engineering application, discusses Volterra filters, circuits to implement.

Schetzen, Martin, Nonlinear System Modeling Based on the Wiener Theory, Proc. of the IEEE, v69, n12, p1557-1573 (1981).

A good explanation of Volterra Series. He points out there are infinite-memory systems (e.g. a fuse) that cannot be described by Volterra or Wiener models. (Because outputs aren't used as inputs; thus MPR with lagged outputs can handle this.) He also describes Hermite Polynomial, which are orthogonal over a Gaussian weighting function.

Masri, S.F. and T.K. Caughey, A Nonparametric Identification Technique for Nonlinear Dynamic Problems, Transactions of the ASME J. of Applied Mechanics, v46, p433-445 (June 1979).

Very similar to MPR. Uses Chebyshev polynomials to identify several oscillators including Duffing and van der Pol.

Cao, C.Q. and R.S. Tsay, Nonlinear Time-Series Analysis of Stock Volatilities, J. of Applied Econometrics, v7, S165-S185 (1992).

Compares Threshold Autoregressive (TAR), autoregressive conditional heteroscedastic (ARCH), generalized ARCH (GARCH) and exponential GARCH (EGARCH) models with ARMA.

Ventresca, C., "Continuous process improvement through designed experiments and multi-attribute desirability optimization", ISA Transactions v32, n1, pp51-64 (1993).

Describes polynomial interaction models to correlate seven responses for noodle processing (e.g. color breaking stress, firmness, etc.) to five independent variables (water absorption, dough pH, mixing time, roll speed, roll gap reduction). Original work done by Oh, et al. Refers to book by Box & Draper which looks important! Also see Derringer and Suich:

Derringer, G. and R. Suich, "Simultaneous optimization of several respnse variables", J. Quality Technol., v12, n4 (1980).

Box, G.E.P., and N.R. Draper, "Empirical Model-Building and Response Surfaces", (Wiley, 1987).

Myers, R.H., "Response Surface Methodology" (Allyn and Bacon, Inc., Boston, 1971).

These two books describe use of response surfaces for design of experiments. Response surfaces would be another application for MPR models. These books limit themselves to "second order models" in which exponents of any variable is no more than two.

Ellner, S., and P. Turchin, “Chaos in a noisy world: New methods and evidence from time-series analysis”, The American Naturalist, v145, n3, pp343-375 (1995).

Applies three kinds of nonlinear time-series analysis to many ecological datasets. The three models are ANNs, splines, and Response Surface Method (RSM), based on Box and Draper (1987). RSM is a polynomial! Except, they use exponentially transformed dependant variables. They give a list of possible exponents {-1, -.5, 0, 0.5, . . . 3} where 0 represents log transform. They apparently do an exhaustive search of all possible polynomials until they get the highest R2. However, their R2 values seem low. They have a good list of both lab and field population data. They also compute Lyapunov exponents for all data/models; just about all are negative.

They use an interesting nonlinear AR model for testing, called the Ricker model: N(t+1) = N(t)*exp[r*(1-N(t))+0.15*Z(t)] where N(t) is population at time t, Z is a Gaussian random variable with mean zero and s=1, r=1.5 (nonchaotic) or r=3 (chaotic).

Turchin, P. “Chaos and stability in rodent population dynamics: evidence from nonlinear time-series analysis”, Oikos, 68:1 (1993), pp 167-172.

More RSM. Note structure is selected a priori, not identified. Good discussion of “integral role that noise plays in dynamics of any real population”. You can have “deterministically damped oscillations which are prevented from converging to an equilibrium by noise”. Justifies using noisy data with model to evaluate Lyapunov exponents. Detrended the data with a quadratic trend. Found temperate populations were nonchaotic, but northern ones tended towards chaos.

Perry, J.N., Woiwod, I.P., Hanski, I., Using response-surface methodology to detect chaos in ecological time series. Oikos; acta oecologia scandinavica. Dec 1993, v68, n3 (also listed as n2) p329.

Proprosed improvement to Turchin and Taylor’s RSM model; uses nonlinear regression to determine exponents. Discusses what to do if model is out of range, described as “ellipsoid in phase space”. Do not like using constraints. Found models sensitive to additional data. Recommend at least ten, and preferably more than fifteen degrees of freedom (they are dealing with short ecological time series).

Tartakovskii, B.A., A.V. Kazakov, and E.V. Lipovskaya, "Use of the Nonlinear Stepwise Regression Method in Reconstructing the Sturcture of Kinetic Relationships for Biosynthesis Processes", Moscow Food Industry Technological Institute. Translated from Teoreticheskie Osnovy Khimicheskoi Tekhnologii, v24, n2, pp 211-217, March-April, 1990, obtained from Plenum Publishing Corp.

Lewis, P.A. and J.G. Stevens, Nonlinear Modeling of Time Series Using Multivariate Adaptive Regression Splines (MARS). J. Am. Statistical Assoc. v86, n416, Dec 1991, Applications and Case Studies.

Speyrer, J.F. and W.R. Ragas, Housing Prices and Flood Risk: An Examination Using Spline Regression, J. of Real Estate Finance and Economics, 4:395-407 (991).

Example of correlation analysis for house price prediction.

Optimization

Rivera, S.L. and M.N Karim, "Optimization of a Bioprocess using Genetic Algorithms", for presentation at AIChE 1992 Annual Meeting, Nov 1-6, 1992, Miami, FL.

Uses "microgenetic" algorithm.

Nonlinear Control Theory

Hernandez, E. and Y. Arkun, Control of Nonlinear Systems Using Polynomial ARMA Models, AIChE Journal, v39, n3, p446-460 (1993).

Ozgulsen, F., S.J. Kendra, and A. Cinar, Nonlinear Predictive Control of Periodically Forced Chemical Reactors, AIChE Journal v39, n4, p589 (1993).

Wu, X. and A. Çinar, “An Automated Knowledge-Base System for Nonlinear System Identification”, Gensym User Society Meeting, May 26-28, Cambridge, MA (1993).

Casti, J.L., Recent Developments and Future Perspectives in Nonlinear System Theory, SIAM Review, v24, n2 (July 1992)

Garfinkel, A., M.L. Spano, W.L. Ditto, J.N. Weiss, “Controlling Cardiac Chaos”, Science, v257, 1230-1235 (1992).

Nonlinear Time Series

Casdagli, Martin & Stephen Eubank, "Nonlinear Modeling and Forecasting", Proc. of the Workshop on Nonlinear Modeling and Forecasting, Sept. 1990, Santa Fe, NM. Also published as SFI Studies in the Sciences of Complexity, Vol XII, Addison-Wesley, 1992.

Includes papers on

Prediction of Chaotic TIme Series using CNLS-Net (Mead, Jones, et al)

Forecasting with Weighted Maps (Stockbro & Umberger)

Diag. testing for nonlinearity, chaos, and general dependence in time-series data (Brock & Potter)

Experim. in modeling nonlinear relat. in time series (Granger & Terasvirta):

blind application of polynomial models (no term selection)

Local & global Lyapunov exponents (Abarbanel):

Local forecasting of High-D chaotic dynamics (Meyer & Packard)

Nonlinear forecasts for S&P stock index (LeBaron)

Sunspot prediction with ANNs (Weigend, Huberman and Rumelhart)

Granger, C.W.J. and T. Terasvirta, “Modelling Nonlinear Economic Relationships”, Oxford University Press (1993).

Has definitions of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).

Shaw, C.T. and G.P. King, Using cluster analysis to classify time series. Physica D 53 (1992) 288-298.

Applied cluster analysis to vortex street oscillations.

Noack, B.R., F. Ohle and H. Eckelmann, Construction and analysis of differential equation from experimental time series of oscillatory systems. Physica D 56 (1992) 389-405.

Andersen, A.P. and N.R. Lomb, Yet Another Look at Two Classical Time Series. in Analyzing Time Series, O.D. Anderson, ed., North-Holland Publishing Co. (1980).

Tries ARIMA, fractional ARIMA, fractional differencing, and bilinear methods on Lynx and Sunspot data.

Vaidyanathan, R. and R. Krehbiel, Does the S&P 500 Futures Mispricing Series Exhibit Nonlinear Dependence across Time?, The Journal of Futures Markets, v12, n6, 659-677 (1992).

They conclude "yes" and find evidence of chaos.

Haber, R. and H. Unbehauen, "Structure Identification of Nonlinear Dynamic Systems -- A survey on Input/Output Approaches", Automatica, v26, n4, pp651-677 (1990).

Discusses various forcings (impulse, step, frequency, normal operating data) to indentify various models, especially linear-in-parameters. Seems to have some focus on polynomial models: mentions Volterra, Wiener-Hammerstein models, GMDH. Good review of stepwise polynomial regression. Compares several criteria (AIC, F, Mallow's, etc.).

Bard, Y. and Lapidus (1970). Nonlinear system identification. Ind. Engng. Chem. Fund., v9, pp628-633.

Demonstrates stepwise selection using t-stat for term inclusion.

Proll, T. and N.N. Karim, "Real-time design of an adaptive nonlinear predictive controller".

This paper describes a controller based on NARX model for controlling wastewater pH neutralization.

Kurths, J. and A.A. Ruzmaikin, “On Forecasting the Sunspot Numbers”, Solar Physics, v126, pp 407-410 (1990).

Use a linear phase-space embedding model to predict sunspot numbers. Results are presented only as a plot of the time series.

Chaos

Briggs, K., “An improved method for estimating Liapunov exponents of chaotic time series.” Physics Letters A, 26 Nov. 1990, vol. 151, (no.1-2):27-32.

Abstract: Discusses the calculation of Liapunov exponents from experimental data. It is suggested that the estimation of Jacobian matrices is best achieved in the case of noisy data by least-squares polynomial fitting. The improvement obtained over the standard method of linear fitting is demonstrated in several model systems. The author also applies the technique to the time-series forecasting problem.

Above is similar to the next two references:

Bryant, R., P. Bryant, and H.D.I. Abarbanel, “Lyapunov exponents from observed time serie,” Physical Review Letters, 24 Sept. 1990, vol.65, (no.13):1523-6.

Bryant, R., P. Bryant, and H.D.I. Abarbanel, “Computing the Lyapunov spectrum of a dynamical system from an observed time series.” Physical Review A, 15 March 1991, vol.43, (no.6):2787-806.

Abstract (by Keith Briggs, U. of Adelaide): The authors examine the question of accurately determining, from an observed time series, the Lyapunov exponents for the dynamical system generating the data. This includes positive, zero, and some or all of the negative exponents. They show that even with very large data sets, it is clearly advantageous to use local neighborhood-to-neighborhood mappings with higher-order Taylor series, rather than just local linear maps as has been done previously. They give examples using up to fifth-order polynomials. They demonstrate this procedure on two familiar maps and two familiar flows: the Henon and Ikeda maps of the plane to itself, the Lorenz system of three ODEs, and the Mackey-Glass delay differential equation. They stress the importance of maintaining two dimensions for converting the scalar data into time delay vectors: one is a global dimension to ensure proper unfolding of the attractor as a whole, and the other is a local dimension for capturing the local dynamics on the attractor. They show the effects of changing the local and global dimensions, changing the order of the mapping polynomial, and additive (measurement) noise. There will always be some limit to the number of exponents that can be accurately determined from a given finite data set. They discuss a method of determinng this limit by numerically obtaining the singularity spectra of the data set and also show how it is often appropriate to make this choice based on the fractal dimension of the attractor. If excessively large dimensions are used, spurious exponents will be generated, and in some cases the accuracy of the true exponents will be affected. They present method of identifying these spurious exponents by determining the Lyapunov direction vectors at particular points in the data set. They then use these to identify numerical problems and to associate data-set singularities with particular exponents. The behavior of spurious exponents in the presence of noise is also investigated, and found to be different from that of the true exponents. These provide methods for identifying spurious exponents in the analysis of experimental data where the system dynamics may not be known a priori.

Abarbanel, H.D.I., R. Brown, M.B. Kennel, “Lyapunov exponents in chaotic systems: their importance and their evaluation using observed data.” International Journal of Modern Physics B, 20 May 1991, vol.5, (no.9):1347-75.

J.-P. Eckmann et al, “Liapunov exponents from time series,” Phys. Rev. A34, 4971 (1986).

Casdagli, M. "Nonlinear Prediction of Chaotic Time Series", Physica D 35 (1989) 335-356.

Compared polynomials, artificial neural networks, etc. All were superior in different cases. His MPR may have been restricted. “Instead of calculating invariants, we attempt to construct a predictive model directly from time series data.”

Sugihara, G. and R.M. May, "Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series", Nature, v344, pp734-741 (1990).

Applies exponentially weighted local method to tent function, measles and chickenpox univariate time series.

Wales, D.J., "Calculating the rate of loss of information from chaotic time series by forecasting", Nature, v350, pp 485-488 (1991).

Uses Sugihara and May's model to estimate largest Lyapunov exponent for real data, as indication of chaos.

Gouesbet, G. and J. Maquet, Construction of phenomenological models from numerical scalar time series. Physica D 58 (1992) 202-215.

Something like Noack, et al's system of ODEs. They postulate coupled 1st order ODEs; I think it will lead to a restricted form of MPR model (sort of like differencing the data to high order). Polynomial terms are selected based on a priori knowledge of underlying PDEs. My statistical approach can be applied to their method, I believe.

Ridley, M. "The Mathematics of Markets", The Economist, 9 October 1993.

Good review of the use of models in market forcasting. Mentions methods of May, Takens.

Wolf, A., J.B. Swift, H.L. Swinney, J.A. Vastano, "Determining Lyapunov exponents from a time series", Physica 16D (1985) 285-317

Provides FORTRAN code for computing Lyapunov exponents using both ODEs of motion and time-series data.

Wolf, A., "Quantifying chaos with Lyapunov exponents", chapter 13 in "Chaos", ed by A.V. Holden, Princeton U. Press (1986).

Summarizes Physica D paper, also elaborates on use of code for discrete mappings, such as Henon map. This would apply to such as quadratic iterator or polynomial models.

Schaffer, W.M., and M.Kot, "Differential systems in ecology and epidemiology", chapter 8, pp 158-178 in "Chaos", ed by A.V. Holden, Princeton U. Press (1986).

Discusses chaotic occurences of Lotka-Volterra equations. Applies Takens embedding method to disease data.

Neural Networks

Chatfield, C. Neural Networks: Forecasting breakthrough or passing fad? Int. J. of Forecasting 9, (1993) 1-3 (North-Holland).

Critical editorial from statistical point of view.

White, A., Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models. J. Am. Statistical Assoc., Dec 1989, v84, n408, p1003-1013.

Chen, S., S.A. Billings, and P.M. Grant, Non-linear system identification using neural networks, Int. J. Control, v51, n6, 1191-1214 (1990).

Mentions NARMAX models, then moves on to ANNs

Weigend, A.S., B.A. Huberman, D.E. Rumelhart, “Predicting the Future: A Connectionist Approach”, Int’l J. of Neural Systems, v1, n3, pp193-209 (1990).

Describe fitting technique using training set to compute parameters of model, validation set to decide when to stop training (to prevent overfitting), and a separate prediction set to test the performance. Also describe a weight-elimination method to prune the network.

Regression, Statistics, FUNCTION APPROXIMATION

Box, G.E.P. and G.M. Jenkins, “Time Series Analysis, Forecasting and Control,” Holden-Day (1976).

Draper, N.R. and H. Smith, 1966, “Applied Regression Analysis” (John Wiley).

Tang, Z., C. de Almeida, P.A. Fishwick, 1991, “Time series forecasting using neural networks vs. Box-Jenkins methodology”, Simulation 57:5, 303-310.

Salahuddin and A.G. Hawkes, Cross-Validation in Stepwise Regression, Commun. Statist.-Theory Meth., 20(4), 1163-1182 (1991).

Gives several examples using multivariate data from the literature.

Routledge, R.D., When stepwise regression fails: correlated variables some of which are redundant, Int. J. Math Educ. Sci. Technol., v21, n3, 403-410 (1990).

Important example of problem which can occur if two of the independent variables are correlated.

Freedman, L.S., D.Pee, and D.N. Midthune, The problem of underestimating the residual error variance in forward stepwise regression. The Statistician, 41, pp405-412 (1992).

"Our investigations of forward stepwise regession . . . have revealed some serious discrepancies . . . when the ratio of variables to observations is high, e.g. greater than 0.25." They also state that omitting important variables results in biased estimates of regression coefficients and their statistics. But wouldn't this also be true of omitting nonlinear relationships? They consider the stepwise procedure to be reduced "to the level of exploratory data analysis".

Fahrmeir, L. and H. Frost, "On Stepwise Variable Selection in Generalized Linear Regression and Time Series Models". Computational Statistics (1992) 7:137-154.

They show stepwise works in some situations where other methods, such as all or partial subset selection, break down "due to non-existence of maximum likelihood estimates in sparse data situations".

Lorentz, G.G., "Representation of functions of several variables by functions of one variable", Chapter 11 in "Approximation of Functions" (Holt, Reinhart and Winston, Inc., 1966).

Describes Kolmogorov/Sprecher Theorem

Thompson, J.R. “Empirical Model Building”, Wiley 1989.

Modeling, Simulation-based, Exploratory Data Analysis, Paradoxes.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download