Confidence Intervals - University of Western States

[Pages:4]Confidence Intervals

R LeFebvre, DC 3/8/11

"The neighborhood of the truth."

Imagine that a friend of yours went to a sporting event and you asked him to estimate how many people were in attendance. Your friend responds "about 12,000." You are surprised that there were so many so you press your friend, trying to see how confident he is in his estimate. Another way of saying this is that you want to see how precise he is being. If he says "somewhere between 10,000 to 14,000." You might think that if his estimate was off (even by his own informal calculation) it wouldn't be more than a couple of thousand either way. That's still a lot of people. On the other hand, he might respond, "well, I suppose it might have been as few as 5,000, but you know it might have been much more--maybe 18 or 20,000." You know immediately that he is not very confident in his original estimate and it is much less precise than you had hoped. Good thing you asked! Clearly, hearing the range (or interval) surrounding the estimate can make a big difference in your confidence in the estimate itself.

In a research setting, statisticians look at data from dozens, hundreds or thousands of patients. The statistician is challenged to find a single number that represents how the sample population in the study responded to a treatment, or a test, or exposure to risk. If we read that a particular treatment reduced pain by 30%, that number is actually just a statistical estimate (it is called the point estimate) attempting to reflect how the group responded as a whole. It may be that if the same exact study were run again with a similar sample, the statistical result might be off from the first estimate, either higher or lower. Through statistical methods, one can often calculate the range wherein the true result most likely lies if it doesn't turn out to really be 30%. The statistician may be able to report with 95% confidence that the actual treatment response might actually lie somewhere between 28 and 33% pain reduction. This range is called the confidence interval. It is nearly always reported at a 95% level of confidence.

This confidence interval would be written .30 (95% CI = 0.28 - 0.33) or (95% CI 0.28, 0.33). A physician reading this would know that the best single estimate of treatment response for the particular cohort under study was a 30% reduction in pain, but if this estimate turns out to be off, there is only about a 5% probability that it would be less than 28% or more than 33%. We would say this is a narrow confidence interval and that the estimated amount of improvement was relatively precise. For example, in one large study of the Canadian rules for taking cervical xrays in acute trauma cases, the sensitivity and specificity of the rules for detecting an important fracture was sensitivity 100% (CI 95% 98-100%), specificity 42% (CI 95% 40-44%). It is obvious that these confidence intervals are quite narrow and the reported sensitivity and specificity were both very precise. It is unusual that confidence intervals are this narrow unless the study was quite large (the general rule is that the larger study population, the more narrow the confidence intervals).

In the same study, the reliability of emergency room doctors properly using the same rules had a Kappa of .63 (95% CI .49-.77). The Kappa indicates that the doctors' agreement with each other was well above what you would expect from chance alone Although the best estimate was that the kappa was 0.63, in reality we can only be confident that it is between 0.49 and 0.77. This interval is probably an acceptable range (although that might be open for debate). It clearly is not as good as for the sensitivity and specificity findings. Sometimes confidence intervals are so wide that it is hard to feel very comfortable with the precision of the results that are being reported.

"Seeing" confidence intervals

It is sometimes easier to appreciate wide vs narrow confidence intervals by seeing them next to each other on a graph called a Forest plot.

In the graph below each horizontal line represents a confidence interval surrounding a study result (the actual single number study result is not visible on this particular graph). You can see that studies 1-4 have short lines representing narrow confidence intervals with good precision. The intervals for studies 5-9 are much wider and consequently much less precise.

It is not unusual that studies with a small number of participants tend to yield results with wide confidence intervals. This is not a fatal flaw in the study, but it does leave us wondering if the true result might be considerably different from the reported estimate. This is one of the reasons that we like to see smaller studies repeated with larger populations.

Statistical significance

There is a second, even more important reason to look at confidence intervals. For example, in the case of a study that compared a certain therapy to a placebo, one end of the range may cross over the line dividing benefit from no benefit. This would mean that while there is a chance that the true result shows the treatment helped the patient, there is also a chance that it didn't actually help the patient at all. In such a situation, the results are generally discounted as not being statistically significant.

A study compared the use of nasal antibiotics with a placebo for patients with acute sinusitis. The outcome measure was whether the drug would cut down on the number of patients who still had symptoms after 10 days. The results were that 29% of the patients who had taken the steroids still had symptoms versus 36% for placebo. At first glance it looks like the antibiotics did better. But once the number of patients and the other factors of the study were considered, the Adjusted Odds Ratio was 0.99. An odds ratio of 1.00 would mean that there was no difference in the odds that a patient on antibiotics would get better faster than those taking a sugar pill. The antibiotic only reduced the odds by 01%. But the confidence interval tells even more of the story. The 95% CI was 0.57-1.73. This means that the true result may have ranged anywhere from dropping the odds of a bad outcome as much as 43%, but might also have increased the odds of a bad outcome by 73%. Since the results straddle the line between helping and not helping, the confidence interval essentially erases even the tiny .01% improvement reported by the point estimate. The conclusion is that there appears to be no statistical or proven clinical benefit to the antibiotic. (Williamson 2008)

Looking at the Forest plot below, we can see at a glance if a line crosses the solid vertical line which divides helping from not helping the patient. Study 7 definitely straddles the line between effective and ineffective. If any part of the confidence interval crosses that solid vertical line, the reported treatment benefits are not considered to be statistically significant.

General application

Whereas not all studies report confidence intervals, it is generally preferred that researchers include them in their papers. You can find confidence intervals reported around many different types of data. Examples include data on sensitivity, specificity and likelihood ratios in diagnostic studies; number needed to treat NNT), effect size, and odds ratios in therapy studies; absolute risk and relative risk in harm studies. They can be reported around anything that is measured such as range of motion data, pain scales, Oswesty scores, etc. Sometimes standard errors are reported instead of confidence intervals. (The 95% CI can be computed as the point estimate ? 1.96 times the standard error).

Bottom line

Always look to see if confidence intervals are reported. They can help put the results of a study into perspective. If you are a student composing a CAT, you should routinely include the confidence intervals.

Check for two things: 1) does the interval appear to be wide or narrow (meaning how precise is the result)? 2) does the interval straddle the line of meaningful distinction (i.e., it helps or does not help, diagnoses or does not diagnose, is or is not a risk factor).

Afterword

You can often tell if a student understands these concepts by reading one of their CATs. Here are examples based on submitted CATs illustrating that a student does not quite "get it."

1. The results were strong because there was a 95% confidence interval.

Why this is a problem: 95% is the norm for confidence intervals, this tells us nothing. What is the actual confidence interval in this study?

2. The NNT was 8 (95% CI)

Why this is problem: the actual interval itself is not reported. How can the estimated result and the range be represented by a single number? Shouldn't there be 3 numbers? For example, 8.0 (95% CI 3.0-11.0).

3. The sensitivity of the test was 89% (95% CI 69-80)

Why this is a problem: The point estimate (89%) does not fall within the interval range!

References

Williamson IG, Rumsby K, Benge S, et al.; Are antibiotics or nasal steroids effective for acute sinusitis?; The Journal of Family Practice; 2008; vol 3/issue 11/pp 156

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download