Introduction to regression analysis - ecology lab



Introduction to regression analysisCorrelations: Measuring and Describing RelationshipsThe direction of the relationship is measured by the sign of the correlation (+ or -). A positive correlation means that the two variables tend to change in the same direction; as one increases, the other also tends to increase.positive slopeA negative correlation means that the two variables tend to change in opposite directions; as one increases, the other tends to decreasenegative slope The most common form of relationship is a straight line or linear relationship which is measured by the Pearson correlation (or the relationship between 2 variables). The strength or consistency of the relationship is measured by the numerical value of the correlation. A value of 1.00 indicates a perfect relationship and a value of zero indicates no relationship. Correlation ExampleWhat are the two variables?What does the patterns suggest?How would you describe this relationship to someone?The Pearson Correlation (most common)attempts to predict the relationship by comparing the amount of covariability (variation from the relationship between X and Y) to the amount X and Y vary separately notice only 2 variablescannot be used with multivariate regressions equations The magnitude of the Pearson correlation ranges from:0.00 (indicating no linear relationship between X and Y) to 1.00 (indicating a perfect straight-line relationship between X and Y). The correlation can be either positive or negative depending on the direction of the relationshipmajor weakness of Pearson Correlationbut using this alone is not enough in complex equations since it take multiple variables to predictand there are other influences on those other variablesexample multivariate regressions equations*** should be +/- between each predictor (IV) variablesSince many variables, Pearson may not work with all equationsIntroduction to Linear Equations & Simple RegressionThe Pearson correlation measures the degree to which a set of data points form a straight line relationship. Regression in general is a statistical procedure that determines the equation for the straight line that best fits a specific set of datathis procedure will reflect variables in the multivariate equationallows the prediction of DV with various that are categorical and/or continuous variablesThe resulting straight line is called the regression linethere are two types of Regressionssimple – 2 variables (covered here)multivariate – many variables (covered later) Linear Equations & Simple Regression Any straight line can be represented by an equation of the form Y = a + bX , where a and b are constants. The value of b is called the slope constant and determines the direction and degree to which the line is tilted. The value of a is called the Y-intercept and determines the point where the line crosses the Y-axis. Example Linear EquationUnderstanding the linear regression equationHow well a set of data points fits a straight line can be measured by calculating the distance between the data points and the line. Using the formula Y = bX + a, it is possible to find the predicted value of ? (“Y hat”) for any X. The error (distance) between the predicted value and the actual value can be found by: ? – Y Because some distances may be negative, each distance should be squared so that all measures are positive.A closer examination of the best fit lineThe total error between the data points and the line is obtained by squaring each distance and then summing the squared values. Total squared error = ?(Y – ?)2The regression equation is designed to produce the minimum sum of squared errors. Best fitting line has the smallest total squared error: the line is called the least-squared-error solution and is expressed as: ? = bX + aEquation for the regression lineAnalysis of a Simple Regression Finally, the overall significance of the regression equation can be evaluated by computing an F-ratio. A significant F-ratio indicates that the equation predicts a significant portion of the variability in the Y scores (more than would be expected by chance alone).using a combination the best combination of systematic (treatment) and unsystematic (error) variables (pic 11/5 6:08)To compute the F-ratio, you first calculate a variance or MS for the predicted variability and for the unpredicted variabilityoverall we want to be able to say that there is less chance of error and our treatment was the predictor and had an effectp = chance of random luck, rest is due to our treatmentIntro. to Multiple Regression w/ 2 Predictor Variablespg 157 (somewhere/ some book)In the same way that linear regression produces an equation that uses values of X to predict values of Y, multiple regression produces an equation that uses two different variables (X1 and X2) to predict values of Y. The equation is determined by a least squared error solution that minimizes the squared distances between the actual Y values and the predicted Y values. (pic here from notes and 6:17 on 11/5) ??? note sure this matches. Look back at his pptReview - Two classes of variables in the model (p. 158 , need to put in Variables notes)Dependent Variables (DV)variable being predictedcannot be nominalsince categorical results, not graphablecalled (many options, all the same thing)criterion variableoutcome variabledependent variableimportant that is in every record/survey/data collection, since we cannot predict without itmust use listwiseIndependent Variables (IV)variable being used as predictorscalled (many options, all the same thing)predictor variableindependent variablefactorsif missing datause pairwise data cleaninga literature review should uncover possible IVs to make a better prediction!take what has been done, add something to the equation to make it a better predictoror look for a gap in the literature for research justificationMultiple regression basics We always want to examine a predictionWe want to use multiple variables (multivariate) to predict an outcome variable more efficientlyStatistical goalThe statistical goal of multiple regression is to produce a model in the form of a linear equation that identifies the best weighted linear combination of independent variables in the study. to optimally predict the criterion variable (p. 160).Regression weights for the predictorsThey will change based on the variables in the model. The variables are relative to the overall model.We will look at the model as whole to predict the total amount of variance explained (THIS IS HUGE).Fully specified regression modelIt depends of the variables that you selectedYou can’t choose all the variables (parsimony matters)Multiple Regression (raw score model) For two predictor variables, the general form of the multiple regression equation is: ?= a + b1X1 + b2X2 + eThe ability of the multiple regression equation to accurately predict the Y values is measured by first computing the proportion of the Y-score variability that is predicted by the regression equation and the proportion that is not predicted.For a regression with two predictor variables:As with linear regression, the unpredicted variability (SS and df) can be used to compute a standard error of estimate that measures the standard distance between the actual Y values and the predicted valuesIn addition, the overall significance of the multiple regression equation can be evaluated with an F-ratioStandardized model (using z-scores)This is easier to use and to discuss the results of the regression model because everything is based on a mean of 0 and s.d. of 1 units. ??????= ?1Xz1 + ?2Xz2 + ?n Xzn ??????=(0.31)(gpaz) + (0.62)(GREz ) You need to build, develop, and create a set of variables known as the variate (p. 164). This is the composite of all the independent variable to the right of the equation. Another way of thinking about this side of the equation is also known as the latent variable (an abstract notion- like academic aptitude). Understanding the regression models ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download