WHY DOES THE STANDARD DEVIATION FORMULA …

[Pages:4]WHY DOES THE STANDARD DEVIATION FORMULA HAVE

n - 1 INSTEAD OF n?

THEORETICAL APPROACH

First let's prove this with theory. We will need a few concepts and formulas from higher level statistics

courses. The expected value, which is basically a more general version of a mean, is a linear operator:

E[aY + b] = aE[Y ] + b, where Y is a random variable and a and b are constants. The variance is a bit dierent: V ar[aY + b] = a2V ar[Y ]. The expected value of a random variable squared, AKA its second moment, is E [Y 2] = V ar[Y ] + (E[Y ])2. Finally, we will make use of the fact that for any distribution of X s, they all have the same properties: for example, E [Xi] = E [Xj ] = ? and V ar [Xi] = V ar [Xj ] = 2 for any i and j.

Let's use Sn2 to represent the variance using the more intuitive choice of n rather than n - 1 in the denominator. We want to nd its expected value, and hope it turns out to be 2, because that's what we're

trying to estimate.

n

E [Sn2]

=

E

1 n

Xi - X? 2

i=1

=

1 n

E

n

Xi - X? 2 ,

i=1

n

nE [Sn2] = E

Xi2 - 2XiX? + X? 2 ,

applying properties of E[?] n mulitplying both sides by ; expanding the square

i=1

n

n

n

=E

Xi2 - 2XiX? + X? 2 , distributing the summation over the parentheses

i=1

i=1

i=1

n

n

= E Xi2 - 2X? Xi + nX? 2 ,

i applying summation properties (note expressions without " ")

i=1

i=1

n

=E

Xi2 - 2X? nX? + nX? 2 , applying the formula for X?

i=1 n

= E Xi2 - 2nX? 2 + nX? 2 ,

simplifying

i=1 n

= E Xi2 - nX? 2 ,

simplifying

i=1 n

= E Xi2 - E nX? 2 ,

applying properties of E[?]

i=1 n

= E [Xi2] - nE X? 2 ,

applying properties of E[?]

i=1

= nE [X2] - nE X? 2 ,

using the fact that Xi has the same properties for all i

E [Sn2] = E [X2] - E X? 2 ,

dividing both sides by n

Now we have to deal each of these squared terms on the right-hand side of the equation. We will use the second moment formula given previously.

First, E [X2] = V ar[X] + (E[X])2 = 2 + ?2

Then, E X? 2

= V ar[X? ] + (E[X? ])2

n

=

V ar

1 n

Xi + ?2,

i=1

substituting the formula for X?

n

=

1 n2

V

ar

Xi + ?2, applying properties of V ar[?]

i=1

n

=

1 n2

V ar [Xi] + ?2,

i=1

applying properties of V ar[?]

n

=

1 n2

2 + ?2

i=1

=

1 n2

(n2)

+

?2

=

1 n

2

+

?2,

simplifying

We can now plug these back into the equation we were originally working on.

E [Sn2] = E [X2] - E X? 2

=

2 + ?2 -

1 n

2

+

?2

=

1

-

1 n

2 + ?2 - ?2,

regrouping

=

n-1 n

2,

simplifying

Uh-oh!

It

looks

like

Sn2

doesn't

actually

estimate

2,

it

estimates

n-1 n

2.

Let's

rearrange

to

isolate

just

2,

since that's what we're actually interested in.

E [Sn2]

=

n-1 n

2

n n-1

E

[Sn2]

=

2

n

E

n1 n-1 n

i=1 n

E

1 n-1

i=1

E

n n-1

Sn2

Xi - X? 2

Xi - X? 2

= 2 = 2 = 2

n - 1 And nally we have the familiar formula for sample variance, with

, giving an unbiased estimate of

the population variance.

PRACTICAL APPROACH

Next, let's look at a specic example. Although the theoretical proof above is all we need, it is not very satisfying. By using an example, it helps convince ourselves psychologically. But keep in mind that one example (even many examples!) is always weaker than a mathematical proof, since there could be some contradictory example we didn't think of.

Page 3 of 4

Page 4 of 4

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download