Example 6: Population proportions One sample ˆ Simple ...
[Pages:10]Simple linear regression
Tron Anders Moger 3.10.2007
Example 6: Population proportions
One sample
? Assume X ~ Bin(n, P) , so that P^ = X is a
frequency.
n
? Then
P^ - P ~ N (0,1)
P(1 - P) / n
(approximately, for large n)
? Thus
P^ - P ~ N (0,1)
P^(1 - P^ ) / n
(approximately, for large n)
? ?
TChounsfidePn Pc^ -e
P^(1 -
Z /2
n
interval
P^ ) < P
for P
<
P^
+
Z
/
2
P^(1 - n
P^ )
=
P^
-
Z
/
2
P^(1 - n
P^
)
,
P^
+
Z
/
2
P^ (1 - P^ ) n
Example 6 (Hypothesis testing)
? Hypotheses: H0:P=P0 H1:PP0 ? Test statistic P^ - P0 ~ N (0,1)
P0 (1- P0 ) n
under H0, for large n
? Reject H0 if
P^ - P0 P0 (1- P0 )
< -Z /2 ,or
if
n
P^ - P0 P0 (1- P0 )
> Z /2
n
Example 7: Differences between population proportions-two samples
?
AsfroesqsthuuametnePc^1iXe=1sX~n11Bina(nnd1 ,
P1 ) P^2
and
= X2 n2
X 2 ~ Bin(n2 , P2,) are
P^1 - P^2 - (P1 - P2 ) ~ N (0,1)
? Then
P1 (1 - P1 ) + P2 (1 - P2 )
(approximately)
? Confidence nin1 terval fonr2 P1-P2
P^1
-
P^2
?
Z
/2
P^1 (1 - n1
P^1 )
+
P^2
(1 - n2
P^2
)
1
Example 7 (Hypothesis testing)
? Hypotheses: H0:P1=P2 H1:P1P2
? Test statistic
P^1 - P^2
~ N (0,1)
P^0 (1 - P^0 ) + P^0 (1 - P^0 )
n1
n2
where
P^0
=
n1 P^1 n1
+ n2 P^2 + n2
? Reject H0 if
P^1 - P^2 P^0 (1 - P^0 ) + P^0 (1 - P^0 )
> Z /2
n1
n2
? Spontanous abortions among surgical nurses and other nurses
? Want to test if there is difference between the proportions of abortions in the two groups
? H0: Pop.nurses=Pothers H1: Pop.nursesPothers
Surgical nurses Other nurses
No. interviewed
67
92
No. pregnancies
36
34
No. abortions
10
3
Percent abortions
27.8
8.8
Calculation:
? P1=0.278 P2=0.088 n1=36 n2=34
p = Total no. abortions = 10 + 3 = 0.186 Total no. pregnancies 36 + 34
z=
0.278 - 0.088
= 2.04
( 1 + 1 )0.186(1 - 0.186)
36 34
? P-value 0.0414=4.1%, reject H0 on 5%sig.level (can't do this in SPSS)
? 95% confidence interval for P1-P2:
(P^1 - P^2 ) ? 1.96 *
P^1(1 - P^1) + P^2 (1 - P^2 ) = (0.015,0.190)
n1
n2
Repetition:
? Testing:
? Identify data; continuous->t-tests; proportions>Normal approx. to binomial dist.
? If continous: one-sample, matched pairs, two independent samples?
? Assumptions: Are data normally distributed? If two ind. samples, equal variances in both groups?
? Formulate H0 and H1 (H0 is always no difference, no effect of treatment etc.), choose sig. level (=5%)
? Calculate test statistic
2
Inference:
? Test statistic usually standardized; (estimator-expected value of estimator under H0)/(estimated standard error)
? Gives you a location on the x-axis in a distribution
? Compare this value to the value at the 2.5%-percentile and 97.5%-percentile of the distribution
? If smaller than the 2.5%-percentile or larger than the 97.5%-percentile, reject H0
? P-value: Area in the tails of the distribution below value of test statistic+area above value of test-statistic (twosided testing)
? If smaller than 0.05, reject H0 ? If confidence interval for mean or mean difference
(depends on test what you use) does not include H0 value from, reject H0
Last week:
? Looked at continuous, normally distributed variables
? Used t-tests to see if there was significant difference between means in two groups
? How strong is the relationship between two such variables? Correlation
? What if one wants to study the relationship between several such variables? Linear regression
35
30
25
kostnad
20
15
10
Connection between variables
140
120
100
80
kostnad
60
40
20
0
1000
1500
2000 areal
2500
0
1
2
3
4
5
?r
We would like to study connection between x and y!
Data from the first obligatory assignment:
? Birth weight and smoking ? Children of 189 women ? Low birth weight is a medical risk factor ? Does mother's smoking status have any
influence on the birth weight? ? Also interested in relationship with other
variables: Mother's age, mother's weight, high blood pressure, ethincity etc.
3
Frequency Expected Normal
Is birth weight normally distributed?
H is tog ram
From explore in SPSS
25
20
15
10
5
0 10 00,00
200 0,00
30 00,0 0
birthw e ight
4 00 0,00
Mean = 2944,6561 Std . D ev. = 729 ,022 42 N = 189 50 00,00
Q-Q plot (check Normality plots with tests under plots):
3 2 1 0 -1 -2 -3
0
Normal Q-Q Plot of birthweight
1 000
2 000
3 000
Observed Value
4 000
5 000
Tests for normality:
Tests of Normality Kolmogorov-Smirnov(a) Shapiro-Wilk
birthweight
Statistic df Sig. Statistic
,043 189 ,200(*)
,992
* This is a lower bound of the true significance. a Liljefors Significance Correction
df
Sig.
189 ,438
The null hypothesis is that the data are normal. Large pvalue indicates normal distribution. For large samples, the p-value tends to be low. The graphical methods are more important
Pearsons correlation coefficient r
? Measures the linear relationship between variables
? r=1: All data lie on an increasing straight line ? r=-1: All data lie on a decreasing straight line ? r=0: No linear relationship ? In linear regression, often use R2 (r2) as a
meansure of the explanatory power of the model ? R2 close to 1 means that the observations are
close to the line, r2 close to 0 means that there is no linear relationship between the observations
4
Testing for correlation
? It is also possible to test whether a sample correlation r is large enough to indicate a nonzero population correlation
? Test statistic: r n - 2
1- r2 ~ tn-2
? Note: The test only works for normal distributions and linear correlations: Always also investigate scatter plot!
Pearsons correlation coefficient in SPSS:
? Analyze->Correlate->bivariate Check Pearson
? Tests if r is significantly different from 0 ? Null hypothesis is that r=0 ? The variables have to be normally
distributed ? Independence between observations
birthweight
Example:
5000,00 4000,00 3000,00 2000,00 10 00, 00
0,00 50 ,0 0
100,00
150,00
200,00
weight in pounds
25 0,00
Correlation from SPSS:
Correlations
birthweight
Pearson Correlation Sig. (2-tailed)
birthweight 1
N
189
weight in pounds Pearson Correlation
,186*
Sig. (2-tailed)
,010
N
189
*. Correlation is significant at the 0.05 level (2-tailed).
weight in pounds
,186* ,010 189
1
189
5
If the data are not normally distributed: Spearmans rank correlation, rs
? Measures all monotonous relationships, not only linear ones
? No distribution assumptions ? rs is between -1 and 1, similar to Pearsons
correlation coefficient ? In SPSS: Analyze->Correlate->bivariate
Check Spearman ? Also provides a test on whether rs is
different from 0
Spearman correlation:
Correlations
Spearman's rho birthweight
Correlation Coefficient
Sig. (2-tailed)
N
weight in pounds Correlation Coefficient
Sig. (2-tailed)
N
**. Correlation is significant at the 0.01 level (2-tailed).
birthweight 1,000
weight in pounds
,248**
.
,001
189
189
,248** 1,000
,001
.
189
189
Linear regression
? Wish to fit a line as close to the observed data (two normally distributed varaibles) as possible
? Example: Birth weight=a+b*mother's weight ? In SPSS: Analyze->Regression->Linear ? Click Statistics and check Confidence interval for B ? Choose one variable as dependent (Birth weight)
as dependent, and one variable (mother's weight) as independent ? Important to know which variable is your dependent variable!
Connection between variables
35
30
25
kostnad
20
15
10
1000
1500
2000 areal
2500
Fit a line!
6
The standard simple
regression model
? We define a model
Yi = 0 + 1xi + i where i are independent, normally distributed, with equal variance 2 ? We can then use data to estimate the model parameters, and to make statements about their uncertainty
What can you do with a fitted line?
? Interpolation
? Extrapolation (sometimes dangerous!)
? Interpret the parameters of the line
0
2
4
6
8
10
12
0
2
4
6
8
10
12
How to define the line that "fits best"?
The sum of the squares of the "errors" minimized = Least squares method!
12
10
8
6
4
? Note: Many other ways to fit the line can be imagined
2
0
0
2
4
6
8
10
12
How to compute the line fit with the least squares method?
? Let (x1, y1), (x2, y2),...,(xn, yn) denote the points in the plane.
? Find a and b so that y=a+bx fit the points by minimizing n
S = (a + bx1 - y1)2 + (a + bx2 - y2 )2 +L + (a + bxn - yn )2 = (a + bxi - yi )2
i =1
? Solution:
( )( )( ) n ( ) b =
xi yi - n xi2 -
xi
yi
xi 2
=
xi yi - nxy xi2 - nxi2
a = yi - b xi = y - bx
n
where
x
=
1 n
xi ,
y
=
1 n
yi and all sums are done for i=1,...,n.
7
How do you get this answer?
? Differentiate S with respect to a og b, and set the result
to 0
S
a
=
n i =1
2(a
+ bxi
-
yi ) =
0
S
b
=
n i =1
2(a
+ bxi
-
yi )xi
=
0
We get: a n + b( xi )- yi = 0 a( xi )+ b( )xi2 - xi yi = 0
This is two equations with two unknowns, and the solution of these give the answer.
y against x x against y
? Linear regression of y against x does not give the same result as the opposite.
Regression of y against x
12
10
8
6
y
4
2
Regression of x against y
0
2
4
6
8
10
12
x
0
Anaylzing the variance
? Define ? SSE: Error sum of squares (a + bxi - yi )2 ? SSR: Regression sum of squares (a + bxi - y)2 ? SST: Total sum of squares ( yi - y)2
? We can show that SST = SSR + SSE
? Define R2 = SSR = 1- SSE = corr(x, y)2
SST SST
? R2 is the "coefficient of determination"
What is the logic behind R2?
SST = yi - y
xi
i = SSE = yi - y^i
SSR = y^i - y y
y^i = a + bxi x
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- example 6 population proportions one sample ˆ simple
- chapter 11 simple linear regression slr and
- lecture 5 hypothesis testing in multiple linear regression
- lecture 12 linear regression test and confidence intervals
- hypothesis testing in linear regression models
- chapter 11 simple linear regression
- lecture 9 linear regression
- chapter 9 simple linear regression
- hypothesis testing and ols regression
- unit 31 a hypothesis test about correlation and slope in a
Related searches
- sample of simple strategic plan
- one sample t statistic calculator
- one sample proportion hypothesis test
- one sample z test calculator
- population mean vs sample mean statistics
- one sample proportion test calculator
- population mean vs sample mean
- population mean and sample mean calculator
- population size vs sample size
- one sample proportion calculator
- one sample proportion test excel
- one sample z test example