REVIEW QUESTIONS



Some useful formulas and results I

Sample mean: [pic]

Sample variance: [pic]

Sample covariance of x and y: [pic]

Sample correlation coefficient of x and y: [pic]

Least squares regression line of y on x: [pic], with slope [pic] and intercept [pic]

General addition rule for the union of two events: [pic]

Multiplication rule: [pic]

Complement rule: [pic]

Definition of conditional probability:[pic]

Independence: events A and B are independent if and only if [pic]

Mean or expected value of a discrete random variable X: [pic]

Variance of a discrete random variable X: [pic]

Expected value of a function of two discrete r.v: [pic]

Covariance of two discrete r.v: [pic]

Correlation of two: r.v: [pic]

Mean and variance of a linear combination of r.v: Let [pic]. Then [pic] and [pic]

Independent r.v: If r.v. X and Y are independent, then[pic]

Omitted variable bias: If the correct model is [pic] (1), and the incomplete model is [pic] (2), the omitted variable bias is defined as[pic]. Considering the auxiliary model [pic] (3), we can compute dy/dx from (2) as [pic] or from (1) and (3) as[pic]. It follows that the bias equals [pic].

Binomial mean and standard deviation: If a count X has the binomial distribution B(n, p), then [pic] and [pic]

Mean and standard deviation of a sample proportion: [pic] and [pic]

Normal approximation for counts and proportions: X is approximately [pic] and [pic] is approximately [pic]

Binomial probability formula: [pic] where [pic]

Mean and standard deviation of a sample mean: [pic] and [pic]

Sampling distribution of the sample mean: a) if a population has the [pic] distribution, then the sample mean [pic] of n independent observations has the [pic] distribution. b) If the population has any distribution with mean [pic] and standard deviation [pic] and the sample size n is large, the sampling distribution of the sample mean is approximately [pic].

Confidence interval for the population mean: a) if the standard deviation [pic] is known, a level C confidence interval for [pic] is [pic], where [pic] is the value on the standard normal curve with area C between -[pic] and [pic]. b) If [pic] is unknown, a level C confidence interval for [pic] is [pic], where [pic] is the value for the t(n – 1) density curve with area C between -[pic] and [pic].

Test of the hypothesis [pic] based on an SRS of size n from a population with unknown mean [pic]: a) if [pic] is known, compute the test statistic [pic]. In terms of a standard normal random variable Z, the P-value for a test of [pic] against [pic] is [pic]. b) If [pic] is unknown, compute the test statistic [pic]. In terms of a random variable T having the t(n – 1) distribution, the P-value for a test of [pic] against [pic] is [pic].

Power of a test against a particular alternative hypothesis: Probability that a fixed level [pic] significance test will reject [pic] when the particular alternative hypothesis is true.

Two-sample t procedures: If an SRS of size n1 is drawn from a normal population with unknown mean [pic] and an independent SRS of size n2 is drawn from a normal population with unknown mean [pic], then we test the hypothesis [pic] by computing the statistic [pic]. The P-values are computed from the t(k) distribution, where the degrees of freedom can be approximated as the smaller of n1 – 1 and n2 – 1. A confidence interval for [pic] is given by [pic].

Confidence intervals for population proportions: An approximate level C confidence interval for the population proportion p when n is large is [pic]. An approximate level C confidence interval for the difference in population proportions p1 – p2 when n1 and n2 are large is [pic].

Significance tests for population proportions: To test the hypothesis [pic] based on an SRS of size n from a large population with unknown proportion p of successes, compute the statistic[pic]. To test the hypothesis [pic] based on two independent SRS of sizes n1 and n2 compute [pic], where [pic] is the pooled estimate of p.

Statistical model for multiple linear regression: [pic], where the [pic]’s are independent and normally distributed with mean 0 and standard deviation [pic]. The regression coefficients [pic] are estimated by the ordinary least squares (OLS) coefficients [pic]. The variance of the error term [pic] is estimated by [pic].

Inference for individual regression coefficients: In a linear regression with k explanatory variables, estimated from a sample of size n, a level C confidence interval for [pic] is [pic], where [pic] is the standard error of [pic] and [pic] is the value of the t(n – k – 1) . To test the hypothesis [pic], compute the statistic [pic]. In terms of a random variable T having the t(n – k – 1) distribution, the P-value for a test of [pic] against [pic] is [pic]. In the case of simple regression (k = 1), the standard deviation of the slope coefficient is [pic].

ANOVA F test: To test the hypothesis [pic], compute the statistic [pic]. The P-value is the probability that a random variable having the F(k, n – k –1) is greater than or equal to the calculated value of the F statistic. The test statistic can also be written as [pic], where [pic] is the multiple coefficient of determination.

Test for m linear restrictions to the regression parameters: To test whether m of the k coefficients of the regression take specific values (e.g. [pic]) compute the statistic [pic], where [pic] is the coefficient of determination of the unrestricted regression and [pic] is the coefficient of determination of the restricted regression. The latter one incorporates all the linear restrictions specified in the null hypothesis. The P-value is the probability that a random variable having the F(m, n – k –1) is greater than or equal to the calculated value of the F statistic.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download