Statistical Power



Power

STT Consulting

April 2010

dietmar@stt-

stt-

Content

Introduction

Power, sample size, and effect size for 2 variances (F-test); G*Power application

Power analysis with “R”

Power calculations with EXCEL (known σ)

-General (z-test)

-Internal Quality Control (IQC)

Confidence interval for power

Software and references

Statistical Power ()

In very basic terms, statistical power is the likelihood of achieving statistical significance. In other words, statistical power is the probability of obtaining a p-value less than 0.05, for example. Obtaining p < 0.05 is exactly what many studies strive for, making the understanding of power calculations “a must”.

A power analysis is typically performed before a study is being planned. It is used to anticipate the likelihood that a study will yield a significant effect. Specifically, the larger the effect size, the larger the sample size, and/or the more liberal the criterion required for significance (alpha), the higher the expectation that the study will yield a statistically significant effect (= the higher the power will be).

These three factors (effect-size, alpha, n), together with power, form a closed system - once any three are established, the fourth is completely determined. The goal of a power analysis is to find an appropriate balance among these factors by taking into account the substantive goals of the study, and the resources available to the researcher.

Role of Effect Size

The term "effect size" refers to the magnitude of the effect under the alternate hypothesis. The nature of the effect size will vary from one statistical procedure to the next, but its function in power analysis is the same in all procedures.

The effect size should represent the smallest effect that would be of clinical, analytical, or other significance. In clinical trials for example, the selection of an effect size might take account of the severity of the illness being treated (a treatment effect that reduces mortality by one percent might be clinically important while a treatment effect that reduces transient asthma by 20% may be of little interest). It might take account of the existence of alternate treatments (if alternate treatments exist, a new treatment would need to surpass these other treatments to be important).

Role of Alpha

Traditionally, researchers in some fields have accepted the notion that alpha should be set at 0.05 and power at 80% (corresponding to a beta of 0.20). This notion is implicitly based on the assumption that a type I error is four times as harmful as a type II error (the ratio of alpha to beta is 0.05 to 0.20), which notion has no basis in fact. Rather, it should fall to the researcher to strike a balance between alpha and beta as befits the issues at hand. For example, if the study will be used to screen a new drug for further testing we might want to set alpha at 0.20 and power at 95%, to ensure that a potentially useful drug is not overlooked. On the other hand, if we were working with a drug that carried the risk of side effects and the study goal was to obtain FDA approval for use, we might want to set alpha at 0.01 while keeping power at 95%.

Role of Sample Size

For any given effect size and alpha, increasing the sample size will increase the power.

Variation in the data (imprecision)

As always, high variation gives poor estimates (e.g., power), except the sample size is high. Note also, the standard deviation in power analysis is often taken from a pilot study. Therefore, it may be appropriate to calculate confidence intervals of power (Taylor DJ, Muller KE. American Statistician 1995;49:43-47; Tarasinska J. Statistics & Probability Letters 2005;73:125-130).

Illustration of the power concept

In very basic terms, statistical power is the likelihood of achieving statistical significance. In other words, statistical power is the probability of obtaining a p-value less than 0.05, for example: we wish to confirm an effect!

When performing power analysis, we have to define the α-error (2-sided), first. Here, we define it at the 5%-level (p = 0.05, z = 1.96).

Under null-hypothesis conditions, we get p < 0.05 in 5% of the cases, however, these are false positives (no effect introduced).

When we introduce an effect (here: shift of the population in k x standard error), the frequency of p Tests with estimates >F-test: p Value = 0.1627.

You are disappointed; gravimetric control may be better, but you could not demonstrate it.

Several questions may arise to get a significant test (α = 0.05):

1. What was the power of the initial experiment?

2. How many aliquots should you have measured at a given power (e.g., 0.9)?

3. How small should SDgrav have been with n = 6 at a given power (e.g., 0.9)?

We address these with the free G*Power software ().

F-tests

>Variance: Test of equality (2 sample case)

Determine effect size (note, the software wants the variance ratio!)

More background material:

G*Power print-screens

1. What was the power of the initial experiment?

[pic]

The power was roughly 24%

2. How many aliquots should you have measured at a given power (e.g., 0.9)?

[pic]

You should have measured 41 aliquots, each. Again, you are disappointed.

3. How small should SDgrav have been with n = 6 at a given power (e.g., 0.9)?

[pic]

Note, the variance RATIO should have been 0.0573. We assume that the variance of the old method was representative VARvol = 64, then VARgrav should have been 64 x 0.0573 = 3.67 and SDgrav should have been SQRT(3.67) = 1.92.

Conclusion

It is very difficult to demonstrate an improvement of imprecision with a low number of measurements (n Statistics >Power Tutorial; similar, sample sizes: stt- >Statistics >Power.

Internal quality control

The EXCEL-file Power Tutorial contains also some power functions for internal quality control rules.

The figure shows power curves for detecting systematic error (in fractions of the stable standard deviation): by the σ-rules (n = 1): 1.96, 2.5, 3, and 3.5 (from left to right).

[pic]

The figure demonstrates that rules with smaller σ’s are more powerful, however, they have a higher probability of FALSE rejection (see: P at 0).

Educative power curves

Some educative power curves for the 1-sided F-test (α = 0.05) and others ─ generated with G*Power and Table copied to EXCEL ─ can be found in stt- >Statistics >Power Tutorial.

[pic]

The figure shows that power curves for the F-test with low n are quite “flat” and reach a desirable power (e.g., p = 0.9) relatively late. For sufficient power of F-tests, sample sizes >10 are desirable.

Don’t forget confidence intervals

Power may have a confidence interval when the standard deviation is estimated.

Computing confidence-bounds for power and sample-size of the general linear univariate model. Taylor DJ, Muller KE. American Statistician 1995;49:43-47.

[pic]

See also: Confidence intervals for the power of Student's t-test. Tarasinska J. Statistics & Probability Letters 2005;73:125-130.

Software and references

Free software

G*Power



Background/educational material:

Others

(no F-test)

(also online; no 1-sample F-test)

Commercial software





Education



Educational text

Hypothesis Testing and Statistical Power of a Test. Hun Myoung Park, Ph.D.



................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download