These worksheets were created by Mike Bryant at Santa Maria High School, 901 S. Broadway, Santa Maria, CA 93455. Feel free to use them, modify them, and share them. But PLEASE, PLEASE, PLEASE share with me any improvements you make. And if anyone out there wants to do the same thing for The Lady Tasting Tea, I would love to not have to create them myself.


Read the Introduction and Chapter 1 to How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Choose one of the quotations inside the front cover and discuss how it relates to the Introduction.

2. What does the author mean in the fourth paragraph of Chapter 1, “It is quite improbably salubrious?”

3. List as many sources of sample bias as you can that are mentioned in Chapter 1 and provide an example of each.

4. Put the second paragraph on Page 18 (“A river cannot….”) into your own words.

5. What is the advantage of a stratified random sample and what difficulties does it pose, according to this chapter?

6. On Page 26, the author suggests that most polls are biased in the direction of the Literary Digest error.

a. What incident does this refer to?

b. That incident took place during (and arguably because of) the Great Depression. Are the lessons learned from that still relevant for us today? Why?

c. In what direction is that bias?

7. Comment on the last paragraph of Chapter 1.

Read the Introduction and Chapter 1 to How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Choose one of the quotations inside the front cover and discuss how it relates to the Introduction.

a. Disraeli: In a fact-minded culture, those without honesty or understanding use statistical language to sensationalize or oversimplify (Pg 8).

b. Wells: We are deluged with data (father-in-law’s newspaper) and claims based on data (antihistamine pills, Pg 8).

c. Ward: Poorly designed procedures or improperly applied statistics can make us confident in conclusions that are not true at all (newspaper, antihistamine, Pg 7-8). It is harder to become disabused of these things we feel confident in (after all, they had a basis in statistical analysis) than to deal with the things we know we are ignorant about.

2. What does the author mean in the fourth paragraph of Chapter 1, “It is quite improbably salubrious?” (Pg 11)

It is unlikely that this figure is healthy, that it will stand up to closer inspection.

3. List as many sources of sample bias as you can that are mentioned in Chapter 1 and provide an example of each.

a. Response errors:

i. Inaccurate knowledge or memory of the respondent. Most people don’t know their own annual income to the nearest dollar (Pg 12).

ii. Exaggeration or minimization. Some exaggerate their income out of vanity or minimize it out of fear of the taxman (Pg 12). Harper vs. True Story (Pg 16). Tooth brushing frequency (Pg 17).

iii. Giving an answer expected to be pleasing to the one asking. Japanese vs. Nazis (Pg 24).

iv. Non-response. Many will choose not to respond to a questionnaire perceived to be personal (Pg 14). Many of those that do not respond to an income question will be those whose incomes are low (Pg 15).

b. Sampling procedure: Under-representation can come from several sources.

i. Less likely to be able to easily locate the Yale men who were less successful (Pg 13). This makes our sample fail to be representative of our population. The least successful people cannot afford to attend reunions, so don’t bother to keep the college informed of their whereabouts (Pg 15).

ii. Convenience samples. “All kinds of people can be found in a [railroad] station.” (Pg 21)

iii. Selecting subjects you are more comfortable talking to (Pg 25, 26).

iv. Difficulties in designing or collecting a stratified sample (discussed below).

c. The source of the bias (or even its existence) may not be readily apparent (Landon/Roosevelt race, Pg 19-20).

4. Put the second paragraph on Page 18 (“A river cannot….”) into your own words.

If the data upon which a statistical study is based are not representative of the population, then neither can the statistic nor the conclusion be accurate. But a conclusion phrased in statistical language and referencing a “statistical study” sounds so erudite that we tend to accept it at face value, without examining the solidity of that foundation. See the quote by Artemus Ward inside the front cover.

Discussion: Cancer data based on reconstructed records didn’t include many who had lived to move from the area (Pg 18). The psychiatrist, observing a steady diet of his own patients, concludes that everyone is neurotic (Pg 19).

5. What is the advantage of a stratified random sample and what difficulties does it pose, according to this chapter?

A purely random sample is difficult and expensive to obtain. But difficulties with a stratified random sample include basing the strata on incorrect assumptions (Pg 21), problems in identifying subjects who fit in your strata (Pg 22), and the question of how to achieve randomization while maintaining the strata (Pg 22). The latter can be especially trying if multiple variables are being considered.

6. On Page 26, the author suggests that most polls are biased in the direction of the Literary Digest error.

a. What incident does this refer to?

Landon/Roosevelt race in 1936 (Pg 20).

b. That incident took place during (and arguably because of) the Great Depression. Are the lessons learned from that still relevant for us today? Why?

See Question c.

The preponderance of cell phones, which are not normally called in telephone polls, became a subject of concern in polls during the 2004 Bush-Kerry presidential race. Cell-phone-only users tend to be disproportionately young, and many feared (or hoped) that the polls showing Bush in the lead were under-representing young voters that tend to be more liberal. As the number of cell phone users continues to grow, this may turn into a real challenge for pollsters.

c. In what direction is that bias?

The bias is “toward the person with more money, more education, more information and alertness, better appearance, more conventional behavior and more settled habits than the average of the population…” (Pg 26).

7. Comment on the last paragraph of Chapter 1.

Though the title of the book refers to lying with statistics, the greater danger is that statistics can be misleading due to sample bias even with the best-intentioned study.

Read Chapter 2 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Chapter 2 begins with an illustration in which someone is snobbish about an income of $15,000 a year. Why did the author choose such a small income figure when it seems that a much higher one would fit the context better?

2. Summarize Chapter 2 in a few sentences.

3. What did the hypothetical business partners do in Chapter 2 to make the distribution of earnings seem more palatable to the employees?

4. When we see an average reported, what do we need to ask besides which kind of average is being used? Why?

5. What confidence level was used in the Census Bureau’s report referred to in this chapter? How do you know? Is the way that this confidence level is described a correct interpretation of the confidence level?

6. Which kind of “average” would best describe the height of students at our high school? Why?

7. Which kind of “average” (statisticians call all three “measures of central tendency”) would give me the best way to compare the performance of two classes of a required math course? Why?

Read Chapter 2 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Chapter 2 begins with an illustration in which someone is snobbish about an income of $15,000 a year. Why did the author choose such a small income figure when it seems that a much higher one would fit the context better?

The book was written in 1954!

2. Summarize Chapter 2 in a few sentences.

The term “average” is often used to hide that a particular measure (mean, median, or mode) has been chosen in order to make the desired impression. “Average” is not a helpful term because its meaning is too broad. Examples: neighborhood average income (Pg 27-28), advertised average pay (Pg 31), average magazine reader (Pg 36).

3. What did the hypothetical business partners do in Chapter 2 to make the distribution of earnings seem more palatable to the employees?

They took a large share of their income as bonuses, which were not reported as “profit.” They included their own salaries in the “average wage or salary,” but not in the “average profit of owners.” And they used the mean rather than the median, so that their own considerable salaries would raise the average. Then, on the poster, they reported the profit as a percentage of the company’s income, to make the wages sound even higher (Pg 34).

4. When we see an average reported, what do we need to ask besides which kind of average is being used? Why?

What is being included in the average? Examples: US Steel report of increase in average weekly earnings that ignored a change in the number of partially employed people (Pg 35). Average profit manipulation (Pg 34). Meaning of family (as well as “average”) in census report (Pg 35).

5. What confidence level was used in the Census Bureau’s report referred to in this chapter? How do you know? Is the way that this confidence level is described a correct interpretation of the confidence level?

95%. It said there were 19 chances out of 20 that the estimate was correct within the margin of error (Pg 35). This is not a correct interpretation, since the estimate is either correct (100%) or not (0%). Our 95% confidence that the true median lies within the stated interval means that our procedures will generate an interval that captures the true median 95% of the time.

6. Which kind of “average” would best describe the height of students at our high school? Why?

There would be little difference between the three measures, since heights tend to be normally distributed (Pg 29).

7. Which kind of “average” (statisticians call all three “measures of central tendency”) would give me the best way to compare the performance of two classes of a required math course? Why?

The median would be a better measure, since most required courses see the majority of the student grades falling within a relatively narrow passing range, but the failing grades spread over a much larger range. The mean would be sensitive to the scores of those few students who never showed up for class.

Read Chapter 3 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Explain why advertisers often rely on a very small sample to substantiate their claims.

2. We have learned that a proper sample of only 1,000 individuals from the entire population of the United States can give us results with only a 3% margin of error (assuming p=.5, at a 95% confidence level). Why, then, does the author suggest that a sample of over a thousand can be much too small in some situations?

3. Chapter 3 is titled “The Little Figures That Are Not There.” This chapter discusses at least five different kinds of information that may be missing when we are faced with a claim based in data. See how many of these missing pieces of information you can find mentioned in the chapter.

4. On Page 44 the author uses two words that may be new to you: rotogravure and invidious. Provide a brief definition of each and explain how the author is using the word in this context.

5. What does the author mean on Page 45 when he says, “Hardly anybody is exactly normal in any way…?”

6. How can the graph of the advertising agency’s business be misleading when the graph clearly shows such an upward trend?

7. When reviewing scholarship applications, I was given a table that listed the GPA of each applicant. What other information do I need in order to evaluate these applicants’ academic performance?

Read Chapter 3 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Explain why advertisers often rely on a very small sample to substantiate their claims.

With a small sample, the differences that occur do to chance create large variations in the proportions, which the advertisers can exploit to give a false impression of the effectiveness of their product. With large samples, the differences produced by chance tend to be smaller proportions (Pg 39). Remember the Law of Large Numbers (YMM Pg 390)?

2. We have learned that a proper sample of only 1,000 individuals from the entire population of the United States can give us results with only a 3% margin of error (assuming p=.5, at a 95% confidence level). Why, then, does the author suggest that a sample of over a thousand can be much too small in some situations?

When dealing with probabilities that are very small, a much larger sample may be needed. The author used the example of the polio vaccine study, where the usual incidence rate would only lead us to expect two cases in the sample of over 1100 that was used (Pg 40).

3. Chapter 3 is titled “The Little Figures That Are Not There.” This chapter discusses at least five different kinds of information that may be missing when we are faced with a claim based in data. See how many of these missing pieces of information you can find mentioned in the chapter.

a. The degree of significance, how likely it is that a test figure represents something produced by chance (Pg 42).

b. The range or deviation from the average (Pg 42).

i. Houses built for the 3.6-person average family, which only made up 45% of the population (Pg 43). (This was from the 1950 census. The 2000 census shows the average family size has fallen to 3.14, but defines family as two or more related persons living together, so this figure does not include homes with one individual. The average household size is only 2.59.)

ii. Gesell’s norms (Pg 44).

iii. Oklahoma City average temperature, with a range of 130˚ (Pg 52).

c. What is being compared to, such as in the claim that a treatment made steel three times harder (Pg 47).

d. What is meant by a word, such as in the claim that electric power was “available” to ¾ of the farms (Pg 47).

e. Labels or scales on graphs. Grape Nuts energy released (Pg 50). Advertising agency’s business trend (Pg 51).

4. On Page 44 the author uses two words that may be new to you: rotogravure and invidious. Provide a brief definition of each and explain how the author is using the word in this context.

Rotogravure is something printed using a particular kind of printing press, particularly the color inserts in a Sunday newspaper. Invidious means unfair or unjust, with respect to a comparison. To assume that there is something wrong with a child when he doesn’t do something by the age listed in the Gessell Norm is an unfair conclusion.

5. What does the author mean on Page 45 when he says, “Hardly anybody is exactly normal in any way…?”

The majority of any measurements do not fall on the average. So knowing that something is above or below average tells us very little unless we know something about the range of values around that average.

6. How can the graph of the advertising agency’s business be misleading when the graph clearly shows such an upward trend? (Pg 51)

a. Without a label for the vertical axis, we are not even sure what is being measured. Is this total income before expenses? (The expenses may have risen more than the income!) Is it adjusted for inflation? Is it number of clients? (The company may have fallen on hard times and now be dealing with a larger number of very small clients.)

b. What is the scale? Even the smallest increase can be made to have a slope that looks quite dramatic if the scale is chosen carefully. (More on this in a later chapter.)

7. When reviewing scholarship applications, I was given a table that listed the GPA of each applicant. What other information do I need in order to evaluate these applicants’ academic performance?

a. What is it that the average is of? That is, what courses did the student take? One student might achieve a 4.0 GPA taking the easiest courses, while another that only has a 3.6 average took all of the most difficult offerings.

b. What is the student’s distribution of grades? Did he achieve a 3.5 GPA by getting all A’s and B’s, or did he have all A’s for the first three years and several recent F’s?

The average only tells part of the story, and can often be misleading!

Read chapters 4, 5 and 6 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapters in class.

1. Select one sentence in Chapter 4 that you believe best summarizes the main points of this chapter.

2. The author suggests that some reported differences may not be real differences at all and others, even though they can be shown to be real differences, should still be ignored. Explain how each of these two situations can arise and give an example of each.

3. What are the steps in producing a “gee-whiz” graph?

4. Here is a reproduction of a Chevy ad that appeared a few years ago.

The scale was printed in white on a gray background, so you had to be looking for it to notice it. If you didn’t notice the scale, what percent would you assign to the Nissan? Redraw this graph using a scale from 0 to 100. How would your version look as an ad?

5. Why is it that making one picture twice as tall as another on a picture graph leaves an impression that one quantity is eight times as big?

6. Chapter 5 talks about the wrong way to make a picture graph. Is there a right way? If so, how do you suppose it could be constructed to avoid leaving a false impression?

7. Try to find an example of a misleading graph in a newspaper or magazine.

Read chapters 4, 5 and 6 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapters in class.

1. Select one sentence in Chapter 4 that you believe best summarizes the main points of this chapter.

My personal favorite is “This is in defiance of the fine old saying that a difference is a difference only if it makes a difference” (Pg 58). Other good choices include “What this comes down to is that the only way to think about IQs and many other sampling results is in ranges,” “But comparisons between figures with small differences are meaningless, and “You must always keep that plus-or-minus in mind, even (or especially) when it is not stated” (Pg 57).

2. The author suggests that some reported differences may not be real differences at all and others, even though they can be shown to be real differences, should still be ignored. Explain how each of these two situations can arise and give an example of each.

With small differences, the variation to be expected in any sampling situation often makes it impossible to tell whether the difference is real or is simply due to that variation. Examples: IQ comparisons (Pg 53-57), magazine article readership (Pg 57).

Even if a difference can be demonstrated with certainty, it may be so small as to be meaningless. Old Gold cigarette ads (Pg 58-59). “Statistically significant” does not equate to “significant” in terms of importance.

3. What are the steps in producing a “gee-whiz” graph?

Truncate the bottom (Pg 62). Expand the scale on the y-axis (Pg 63).

4. Here is a reproduction of a Chevy ad that appeared a few years ago.

The scale was printed in white on a gray background, so you had to be looking for it to notice it. If you didn’t notice the scale, what percent would you assign to the Nissan? Redraw this graph using a scale from 0 to 100. How would your version look as an ad?

5. Why is it that making one picture twice as tall as another on a picture graph leaves an impression that one quantity is eight times as big?

When we double the height of the picture, in order to make it appear proportional we normally double the length as well. This makes the area four times as large. But our mind perceives the picture as representing a three-dimensional object, which would then have eight times the volume (Pg 69-70).

6. Chapter 5 talks about the wrong way to make a picture graph. Is there a right way? If so, how do you suppose it could be constructed to avoid leaving a false impression?

Instead of making one picture larger than another, use multiple copies of pictures of the same size to construct the bars of the graph. Then if we double the length of the bar it only doubles the area and volume, so is not misleading.

7. Try to find an example of a misleading graph in a newspaper or magazine.

Read Chapter 7 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. What is a semi-attached figure?

2. Name as many general strategies from this chapter as you can for using the semi-attached figure. (I count roughly half a dozen.)

3. Here is an excerpt from a guest commentary column in the Santa Maria Times by Ron Fink (May 17, 2005). See how many semi-attached figures you can spot.

The North [Santa Barbara] County routinely meets state air quality standards and the South Coast does not. Why is that? ... The South Coast doesn’t have any more industrial pollution emitters than does the north. Failure to meet the standard may be connected to the 71 tons of methane that is released daily from natural seeps off of our coastline, not any human activities. You see, the instruments used to measure air contaminates [sic] cannot discriminate between natural and man-made pollutants.

Nature is not perfect when judged by man’s rules for polluting the environment. Anyone with a pollen-associated sinus problem can readily attest to the impact that nature has on our daily lives.

And what about those greenhouse gases, the purported cause of global warming, where do they come from? A major source is volcanoes…. Satellite data after the 1991 eruptions of Mt Pinatubo (the Philippines) and Mt Hudson (Chile) showed a 15-20 percent ozone loss at high latitudes and a greater than 50 percent loss over the Antarctic! … The U.S. Geological Survey determined that: globally, large volcanic explosive eruptions that inject a tremendous volume of sulfur aerosols into the stratosphere can lead to lower surface temperatures and promote depletion of the Earth’s ozone layer. Ash from such events [as the Mt St Helens eruptions] can travel 100 miles or more downwind!

So despite what editorial writers, local environmentalists and movie makers want you to believe, its [sic] nature that is responsible for the greatest changes in nature, not human activity or President Bush’s environmental policies. I have often wondered how we mere humans could equal the magnitude of pollution created by nature no matter how many oil wells we drill, power plants we build, air-conditioners we operate, hair spray we use, cars we paint or SUV’s we drive.

4. See if you can find an example of a semi-attached figure. They are generally easier to find in writing that is meant to be persuasive, such as advertising or commentaries.

Read Chapter 7 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. What is a semi-attached figure?

A figure, fact, or argument that seems at first glance to be related to the point that is being made, but upon closer inspection is not.

2. Name as many general strategies from this chapter as you can for using the semi-attached figure. (I count roughly half a dozen.)

a. Prove something that sounds like it has to do with your topic. (Number of germs killed by a cold nostrum. Are 31,000 germs a lot? In what concentration? What kind of germs? Do germs have anything to do with colds? Pg 74-75.)

b. Present testimony from someone well known or respected. (What brand of cigarettes do doctors smoke? Pg 76-77.) Are we more convinced Jello Pudding is good because Bill Cosby likes it? What brand of motor oil does a NASCAR driver use?

c. Present a figure that doesn’t mean what it sounds like. (Extracts 26% more juice, Pg 77.) I was recently perplexed when I looked at the label on my 100% juice container and saw all of the ingredients. Looking closer, the label reads, “100% Juice and other ingredients.” There is obviously something more to the legal definition of 100% juice than there is to the mathematical definition.

d. Make comparisons between things that aren’t really comparable. (Fatalities morning vs. evening, Pg 78; death rate in Navy vs. New York City, Pg 83; teacher salary in lowest district vs. those in New York City, Pg 85.)

e. Count something and report the total as something not quite the same. (Have a complaint vs. opposed to the union, Pg 80.)

f. Use percentages of large or small sums to make quantities seem smaller or larger (GM profit on sales vs. profit on investment, Pg 81.)

g. Confuse a change in the rate of reporting with a change in the rate of incidence. (Influenza or malaria in South, Pg 82-83; increase in polio cases, Pg 84.)

3. Here is an excerpt from a guest commentary column in the Santa Maria Times by Ron Fink (May 17, 2005). See how many semi-attached figures you can spot.

The North [Santa Barbara] County routinely meets state air quality standards and the South Coast does not. Why is that? ... The South Coast doesn’t have any more industrial pollution emitters than does the north. Failure to meet the standard may be connected to the 71 tons of methane that is released daily from natural seeps off of our coastline, not any human activities. You see, the instruments used to measure air contaminates [sic] cannot discriminate between natural and man-made pollutants.

Nature is not perfect when judged by man’s rules for polluting the environment. Anyone with a pollen-associated sinus problem can readily attest to the impact that nature has on our daily lives.

And what about those greenhouse gases, the purported cause of global warming, where do they come from? A major source is volcanoes…. Satellite data after the 1991 eruptions of Mt Pinatubo (the Philippines) and Mt Hudson (Chile) showed a 15-20 percent ozone loss at high latitudes and a greater than 50 percent loss over the Antarctic! … The U.S. Geological Survey determined that: globally, large volcanic explosive eruptions that inject a tremendous volume of sulfur aerosols into the stratosphere can lead to lower surface temperatures and promote depletion of the Earth’s ozone layer. Ash from such events [as the Mt St Helens eruptions] can travel 100 miles or more downwind!

So despite what editorial writers, local environmentalists and movie makers want you to believe, its [sic] nature that is responsible for the greatest changes in nature, not human activity or President Bush’s environmental policies. I have often wondered how we mere humans could equal the magnitude of pollution created by nature no matter how many oil wells we drill, power plants we build, air-conditioners we operate, hair spray we use, cars we paint or SUV’s we drive.

The columnist’s thesis is that the lower air quality in southern Santa Barbara County may be due to natural methane seepage. Nothing in the article proves this, and almost none of the article even has anything to do with it.

• How does 71 tons of methane released daily compare to that released in areas that meet the air quality standards? And, for that matter, how do we know it is 71 tons? The quantity sounds very official, being so precise, though I doubt we have knowledge of the true amount that is precise within even ten tons.

• What does pollen have to do with it? Is the detection equipment unable to distinguish between pollen and methane?

• Is he suggesting that the air quality is Santa Barbara is worse than that in the northern end of the county due to volcanoes in the Philippines? How does ash traveling 100 miles from Mt St Helens in Oregon affect Santa Barbara on the central coast of California?

• Is the ozone layer thinning over the Antarctic affecting the air quality in Santa Barbara? And what were those percentages in comparison to? The year before the eruption? Immediately before the eruption? The average thickness? Or volume? Or screening ability?

• Has he presented any evidence at all that nature contributes more to the air pollution than those things he lists in his conclusion? And, even if it does, does that make those things not responsible for the difference between the two ends of the county?

4. See if you can find an example of a semi-attached figure. They are generally easier to find in writing that is meant to be persuasive, such as advertising or commentaries.

Read Chapter 8 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. What does post hoc mean?

2. List several kinds of correlation that might lead to post hoc reasoning.

3. The chapter also warns against the dangers of extrapolation. What is one example the author uses in that discussion?

4. Name two variables that you suspect would have correlation if we collected data, but for which a cause and effect relationship does not exist. (Hint: think of a common causal factor first, then select your two variables.) Be creative and make it as outrageous as possible.

5. Comment briefly on each of the following reports.

a. Teen drug use linked to truancy

Press Association

Tuesday July 29, 2003

Truants are more than five times as likely to take drugs than other schoolchildren, according to research published today. The survey found that 35% of pupils who had ever played truant admitted they had taken drugs in the last month, compared with just 6% who had not skipped school. The study of 10,000 schoolchildren aged 11 to 15 also found that truants were far more likely to regularly drink alcohol and smoke.

b. 20 November, 2003

Low self-esteem 'shrinks brain'

By Pallab Ghosh

BBC Science Correspondent

People with a low sense of self worth are more likely to suffer from memory loss as they get older, say researchers. The study, presented at a conference at the Royal Society in London, also found that the brains of these people were more likely to shrink compared with those who have a high sense of self esteem. Dr Sonia Lupien, of McGill University in Montreal surveyed 92 senior citizens over 15 years and studied their brain scans. She found that the brains of those with low self-worth were up to a fifth smaller than those who felt good about themselves. These people also performed worse in memory and learning tests.

| |

c. Historical data show that the candidate who raises the most money before the primary season starts generally wins the nomination.

d. Winning World Cup lowers heart attack deaths

Apr 16, 2003 (Reuters) - Winning soccer's World Cup not only lifts a nation's spirits, it lowers the death rate from heart attacks, doctors said on Tuesday. During the 1998 World Cup when France defeated Brazil in the final, deaths from heart attacks in men and women dropped on the day of the match, which was watched by 26 million French TV viewers. Instead of about 33 deaths a day in the five days before and after the match, 23 men died of a heart attack on match day. There were also fewer deaths in women but the decrease was not as significant.

e. Church Attendance Boosts Immunity Going to church may be good for the body as well as the soul. In a study of 1700 older Americans, researchers at Duke University Medical Center found that those who attended religious services had stronger immune responses. About 60% of the men and women surveyed attended religious services at least once a week. Blood tests showed that regular attendees were less likely to have high levels of an immune-system protein involved in age-related diseases. -- Article from Parade Magazine, 1997

Read Chapter 8 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. What does post hoc mean?

Logical fallacy of believing that temporal (time) succession implies a causal relation.

2. List several kinds of correlation that might lead to post hoc reasoning.

a. Caused by chance. Another sample may have no correlation at all. (Cancer vs. milk, Pg 95-96.)

b. Real relationship, but uncertain which is the cause and which the effect. (Spinsterhood vs. college education? Pg 94).

c. Common cause. (Smoking vs. college grades, Pg 87; minister’s salary vs. price of rum, Pg 90.)

3. The chapter also warns against the dangers of extrapolation. What is one example the author uses in that discussion?

Rainfall vs. corn height; years of education vs. income. Pg 91.

4. Name two variables that you suspect would have correlation if we collected data, but for which a cause and effect relationship does not exist. (Hint: think of a common causal factor first, then select your two variables.) Be creative and make it as outrageous as possible.

Or simply think of two variables that have trends in the same direction without regard to the causes.

5. Comment briefly on each of the following reports.

a. Teen drug use linked to truancy

Press Association

Tuesday July 29, 2003

Truants are more than five times as likely to take drugs than other schoolchildren, according to research published today. The survey found that 35% of pupils who had ever played truant admitted they had taken drugs in the last month, compared with just 6% who had not skipped school. The study of 10,000 schoolchildren aged 11 to 15 also found that truants were far more likely to regularly drink alcohol and smoke.

This doesn’t mean that either one causes the other. It is likely that a common attitude increases the tendency toward both of these behaviors.

b. 20 November, 2003

Low self-esteem 'shrinks brain'

By Pallab Ghosh

BBC Science Correspondent

People with a low sense of self worth are more likely to suffer from memory loss as they get older, say researchers. The study, presented at a conference at the Royal Society in London, also found that the brains of these people were more likely to shrink compared with those who have a high sense of self esteem. Dr Sonia Lupien, of McGill University in Montreal surveyed 92 senior citizens over 15 years and studied their brain scans. She found that the brains of those with low self-worth were up to a fifth smaller than those who felt good about themselves. These people also performed worse in memory and learning tests.

Is it also possible that a physiological problem with the brain leads to a difficulty in thought processes or sense of well-being that in turn contributes to a feeling of low self-worth?

c. Historical data show that the candidate who raises the most money before the primary season starts generally wins the nomination.

Perhaps the candidate who is most popular (and is therefore going to win) is able to raise more money due to this popularity. This is one where the cause and effect are probably hopelessly entwined, with each variable contributing to the other.

d. Winning World Cup lowers heart attack deaths

Apr 16, 2003 (Reuters) - Winning soccer's World Cup not only lifts a nation's spirits, it lowers the death rate from heart attacks, doctors said on Tuesday. During the 1998 World Cup when France defeated Brazil in the final, deaths from heart attacks in men and women dropped on the day of the match, which was watched by 26 million French TV viewers. Instead of about 33 deaths a day in the five days before and after the match, 23 men died of a heart attack on match day. There were also fewer deaths in women but the decrease was not as significant.

If this was the only incidence looked at, this could be coincidence. If many took the day off to watch the finals, the reduced stress could be the cause, without respect to the activity for which the day was being taken off. In any case, I would question the headline’s suggestion that it was winning that lowered the deaths, since there is no mention of having looked at the heart attack death incidence rate for the losing team’s nation.

e. Church Attendance Boosts Immunity Going to church may be good for the body as well as the soul. In a study of 1700 older Americans, researchers at Duke University Medical Center found that those who attended religious services had stronger immune responses. About 60% of the men and women surveyed attended religious services at least once a week. Blood tests showed that regular attendees were less likely to have high levels of an immune-system protein involved in age-related diseases. -- Article from Parade Magazine, 1997

Or do those who feel better get out more, and therefore attend church more?

Read Chapter 9 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Who do you think are the people who are most likely to statisticulate, and for what purposes?

2. Which of the maps on Page 103 is drawn fairly, without statisticulation?

3. Why are percentages so often a source of statisticulation? (I’m starting to like this new word. ()

4. Explain how the difference between two students at the 90th and 95th percentiles, five percentiles apart, can be (and usually is) greater than the difference between two students at the 50th and 60th percentiles, ten percentiles apart.

5. Summarize the chapter in one or two sentences.

Read Chapter 9 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. Who do you think are the people who are most likely to statisticulate, and for what purposes?

a. The media, to sensationalize.

b. Politicians, to win your vote.

c. Advertisers, to sell their product.

d. Anyone trying to convince you of something, especially if they stand to profit from your becoming convinced.

2. Which of the maps on Page 103 is drawn fairly, without statisticulation?


3. Why are percentages so often a source of statisticulation? (I’m starting to like this new word. () (Pg 109)

a. They can mask very small and therefore largely meaningless samples by hiding the counts.

b. They can mean very different things depending on what is chosen as the base, which is often not even stated.

c. So many people don’t understand them in the first place, on both the writers’ and readers’ ends of the communication.

4. Explain how the difference between two students at the 90th and 95th percentiles, five percentiles apart, can be (and usually is) greater than the difference between two students at the 50th and 60th percentiles, ten percentiles apart.

(Pg 115). Due to the normal distribution of most measures of achievement, the performance levels of the students will be grouped most heavily about the mean. But percentiles are a ranking system, therefore the achievement levels of students at the middle range of percentiles are grouped more closely together than those at the extremes. The 90th and 95th percentiles correspond to roughly to z-scores of 1.28 and 1.64, respectively, a difference of 0.36 SD. The 50th and 60th percentiles correspond to z-scores of about 0 and 0.25, a difference of only 0.25 SD.

5. Summarize the chapter in one or two sentences.

Many use statistics to distort or hide the truth, sometimes through ignorance, but often intentionally. Be wary when encountering statistics!

Read Chapter 10 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. As we look to see who is offering a statistic for our consumption, what should we look for?

2. If a respectable organization is cited as a source of a statistic, what do we still need to consider about that authority?

3. The example of the survey on prices during the Korean War brought out what can be a subtle way in which a sample can be biased. What is that source of bias and what is the name we give to samples likely to be biased in that manner?

4. What are some of the many critical things that are commonly missing when a statistic is reported in the media?

5. One of the more subtle ways for somebody to “change the subject” has to do with statistics based on what people say. What is the difficulty that it is essential we look out for with such a statistic?

6. Summarize the five questions we can ask to defend ourselves as a consumer of statistics. (Explain briefly what each question is about.)

Read Chapter 10 of How to Lie With Statistics. Answer the following questions and be ready to discuss the chapter in class.

1. As we look to see who is offering a statistic for our consumption, what should we look for?

Bias, conscious or unconscious (Pg 123).

2. If a respectable organization is cited as a source of a statistic, what do we still need to consider about that authority?

That it stands behind the information, and not merely alongside it (Pg 125).

3. The example of the survey on prices during the Korean War brought out what can be a subtle way in which a sample can be biased. What is that source of bias and what is the name we give to samples likely to be biased in that manner?

The sample was biased by the large number that failed to respond. This is the difficulty with a voluntary response sample.

4. What are some of the many critical things that are commonly missing when a statistic is reported in the media?

a. Number of cases.

b. Confidence level or level of significance.

c. Variability.

d. What variety of average is being used.

e. Something to compare the figure to (mothers over 35; 2800 deaths during fog, Pg 128).

f. Raw numbers to put percentages in perspective (33% of coeds married faculty, Pg 128).

g. Some indication of distribution to go with averages (stock held by 3,003 persons with average of 660 shares each vs. three men held 3/4 of the stock, Pg 129).

h. Base for an index (Pg 129-130).

i. The factor responsible for the change (Easter in a different month; a change in reporting procedures, lifespan, populations, diagnostic techniques, Pg 130).

5. One of the more subtle ways for somebody to “change the subject” has to do with statistics based on what people say. What is the difficulty that it is essential we look out for with such a statistic?

People often either don’t know the information that they are reporting, and merely estimate, or they lie for any of a number of reasons. (More people 65 to 70 years old in 1950 than there were 55 to 60 ten years earlier. Pg 134.)

6. Summarize the five questions we can ask to defend ourselves as a consumer of statistics. (Explain briefly what each question is about.)

a. Who says so? Is there likely to be bias in either the one analyzing the data or reporting the statistic? Is the cited authority really standing behind the statistic?

b. How does he know? Is there likely to be bias in the sample? Is it representative?

c. What’s missing? Do we have everything we need to know in order to fully understand the significance of the statistic that is being offered?

d. Did somebody change the subject? Are definitions of all terms fully understood, and consistent for any comparisons? Is the data likely to be accurate, or was there opportunity and reason for the subjects to lie? Is correlation being represented as causation?

e. Does it make sense? Is it believable? Or are we being blinded by the seemingly sophisticated analysis and scientific-sounding statistic? Is it reasonable to extrapolate this far? (Mark Twain’s prediction for the Mississippi River, Pg 142.)





