Triola A - PBworks



Triola Assignment B

Section 4-2 – Basic Skills and Concepts

1) What does it mean when we say that “the probability of winning the grand prize is the Illinois lottery is 1/20358520? Is such a win unusual?

A parameter is value (number) that refers to the entire population being studied. A statistic is a value (number) that refers to a sample of a larger population. Check: OK

3) Determine whether the given value is a statistic or a parameter: A sample of households is selected and the average (mean) number of people per household is 2.58.

2.58 is a statistic because it the mean for a sample of the larger population. Check: OK

5) Determine whether the given values are from a discrete or continuous data set: In the Chapter Problem, it was noted that when 50 letters were sent as part of an experiment, three of them arrived at the target address.

The values are from a discrete set because it does not make sense in the context of the problem to have a decimal part of a letter. Check: OK

7) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

9) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

11) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

13) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

17) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

19) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

23) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.

The years would be considered an interval measurement since it is possible to determine differences between the various years, but there is no zero level that represents zero time. Check: OK

Section 1-3 – Basic Skills and Concepts

1) What is a voluntary response sample, and why is it generally unsuitable for methods of statistics?

A voluntary response sample is a collection of participants who determine for themselves whether or not to participate in a study. Examples of voluntary response samples might include: individuals who respond to an Internet survey, individuals who respond to a mail-in survey, etc.

Voluntary response samples are not generally suitable because they lack the key characteristic of randomness. In addition, there is no guarantee that the participants that comprise the sample accurately reflect the composition of the larger population. In general, the results obtained from a voluntary response sample cannot be accurately generalized to the larger population. The population is likely to be biased. Check: OK after adding in phrase about bias.

5) Use critical thinking to develop an alternative conclusion: Based on a study of heights of men and women who play basketball, a researcher concludes that the exercise from playing basketball causes people to grow taller.

Based on the characteristics and requirements of the game of basketball, athletes who are taller generally tend to play basketball. Check: OK

17) An economist randomly selects 10 wage earners from each of the 50 states. For each state, he finds the average of the annual incomes, and he then adds those 50 values and divides by 50. Is the result likely to be a good estimate of the average (mean) of all wage earners in the United States? Why or why not?

The economist found the average of the 10 people in a state in order to determine the mean annual income for that. After determining the averages for each of the 50 states, he found the average of the averages in order to determine a mean for the entire country. Although his procedure was mathematically correct, the process has several flaws with regard to his sample.

First, the sample size is much too small to give an accurate reflection of the entire population. Even if he chose to find a single average of all 500 people, 500 is too small of a number to accurately represent a population of 300000000.

Second, each state has different characteristics with respect to population, socioeconomic status, job market, geography, and so forth. For example, California has a significantly higher population than Rhode Island. Thus, choosing 10 people from Rhode Island and 10 people from California does not accurately reflect the characteristics of the population. Check: OK

Section 1-4 – Basic Skills and Concepts

1) What is the difference between a random sample and a simple random sample?

A random sample is used when every individual in a population has an equal chance of being selected to be in the sample. For example, suppose that the population consists of every student in the school. If sample of 50 students is taken, every student in the school has an equal chance of being chosen. A simple random sample is when every possible sample of a given size has an equal chance of being selected. From the previous example, every possible combination of 50 students has an equal chance of being selected. It would not be simple random sample if the students were placed into permanent groups of 10 and 5 groups were randomly selected. Check: OK

5) Determine whether the given description corresponds to an observational study or an experiment: Nine-year-old Emily Rosa became an author of an article in the Journal of the American Medical Association after she tested professional touch therapists. Using a cardboard partition, she held her hand above the therapist’s hand, and the therapist was asked to identify the hand that Emily chose.

This would be considered an observational study since the participants are being observed but not changed by the procedure. Check: My original thought was that it was an experiment. I overlooked looked the condition that the participant or subject is treated (or modified) in an experiment. OK

9) Identify the type of observational study (cross-sectional, retrospective, prospective): A researcher from Mr. Sinai Hospital in New York City plans to obtain data by following (to the year 2015) siblings of victims who perished in the World Trade Center terrorist attack of September 11, 2001.

This is an example of a prospective (or longitudinal) study since the research plans to follow the participants for an extend period of time and collect data at some point or points in the future. Check: OK

21) Identify which type of sampling is used: In a Gallop poll of 1059 adults, the interview subjects were selected by using a computer to randomly generate telephone numbers there were called.

This is an example of random sampling since the phone numbers are randomly determined. Check: OK

Unit 1 – Review Exercises

1) Shortly after the World Trade Center towers were destroyed by terrorists, American Online ran a poll of its Internet subscribers and asked this question: “Should the WTC towers be rebuilt?” Among the 1,304,240 responses, 768,731 answered yes, 386,756 answered no, and 248,753 said that is was too soon to decide. Given that this sample is extremely large, can the responses be considered to be representative of the population of the U.S.? Explain.

Due to the fact that sampling process involved voluntary response, it cannot be assumed that the sample is representative of the entire population. Most likely, people who had strong feelings about the response were the ones that responded. In addition, the poll was only available to those with Internet access. Check: OK

3) Identify the level of measurement used in each of the following?

a) The weight of people being hurled through the air…

Continuous Ratio Check: OK

b) A movie critic’s ratings of “must see, recommended, not recommended…”

Discrete Ordinal Check: OK

c) A movie critic’s classification of “drama, comedy, adventure”

Discrete Nominal Check: OK

d) Bob, who is different in many ways, measures time in days, with 0 corresponding to his birth date…

Discrete Interval Check: OK

5) Identify the type of sampling used when a sample of the 366000 Coke shareholders is obtained as described. Then determine if the sample is representative of the population:

a) A complete list is compiled and every 500th name is selected

Systematic – This will be representative. Check: OK

b) At the annual stockholders’ meeting, a survey is conducted of all who attend

Convenience – The sample will be representative depending on the number of stockholders who attend. However, if only those who care the most about the meeting attend, then the sample may not be representative. Check: OK

c) Fifty different stockbrokers are randomly selected, and a survey is made of their clients…

Stratified – This will not be representative because different stockbrokers may have different numbers of clients. Check: The answer is clustered since stockholders are grouped by stockbroker and then sampled. The sample is not representative.

d) A computer file of all stockholders is compiled and numbered and a computer generates random numbers to select the sample…

Random – This will be representative. Check: OK

e) All of the stockholder zip codes are collected and 5 stockholders are randomly selected from each zip code

Clustered – This will not be representative since there may a larger concentration of stockholders in one zip code versus another (i.e. urban areas versus rural areas). Check: The sampling is stratified since the stockholders are grouped by zip code and then randomly sampled. The sample is not representative.

Unit 1 – Cumulative Review Exercises

1) Sum = 3.0630 + 3.0487 + 2.9149 + 3.1358 + 2.9753 = 15.1377 Check: OK

Mean = 15.1377 ÷ 5 = 3.02754 Check: OK

3) [pic] Check: OK

5) [pic] Check: OK

Section 2-2 – Basic Skills and Concepts

1) What is a frequency distribution and why is it useful?

A frequency distribution uses some type of method for listing data values and their corresponding counts (or frequencies). A frequency distribution is useful for organizing data, looking for patterns, and visualizing the data. Check: OK

5) Identify the class width, class midpoints, and class boundaries for the frequency distribution:

|Daily Low Temp ((F) |Frequency |

|35-39 |1 |

|40-44 |3 |

|45-49 |5 |

|50-54 |11 |

|55-59 |7 |

|60-64 |7 |

|65-69 |1 |

Class Width: 5

Class Midpoints: 37, 42, 47, 52, 57, 62, and 67

Class Boundaries: 34.5, 39.5, 44.5, 49.5, 54.5, 59.5, 64.5, and 69.5 Check: OK

9) Does the frequency distribution given in Exercise 5 appear to have normal distribution?

The two general criteria for a normal distribution are: 1) frequency start low, reach a maximum, and then finish low; and 2) symmetry. In the case, the distribution does appear to be normal. Check: OK

Section 2-3 – Basic Skills and Concepts

1) What important characteristic of data can be better understood through examination of histogram?

A histogram gives a visual representation of the shape (i.e. normal, skewed, etc.) of the distribution. Check: OK

5) How many crew members are included in the histogram (on page 54 of the text)?

2 + 10 + 5 + 1 = 18 crew members Check: OK

11) Refer to Exercise 19 in Section 2-2 and use the frequency distribution to construct a histogram. Do the data appear to be normal?

|Rainfall (Inches) |Frequency |

|0.00-0.24 |46 |

|0.25-0.49 |5 |

|0.50-0.74 |0 |

|0.75-0.99 |0 |

|1.00-1.24 |0 |

|1.25-1.49 |1 |

[pic]

The two general criteria for a normal distribution are: 1) frequency start low, reach a maximum, and then finish low; and 2) symmetry. In the case, the distribution does not appear to be normal since it is skewed in one direction. Check: OK

15) Refer to Exercise 23 in Section 2-2 and use the frequency distribution for the weights of the pre-1983 pennies to construct a histogram. Do the weights appear to be normal?

|Coin Weights (Grams) |Frequency |

|2.9500-2.9999 |2 |

|3.0000-3.0499 |3 |

|3.0500-3.0999 |22 |

|3.1000-3.1499 |7 |

|3.1500-3.1999 |1 |

[pic]

The two general criteria for a normal distribution are: 1) frequency start low, reach a maximum, and then finish low; and 2) symmetry. In the case, the distribution does appear to be normal. Check: OK

17) Refer to Table 2-8 and use the relative frequency distribution for the best actors to construct a relative frequency histogram. Do the two genders appear to win Oscars at different ages?

[pic]

Although there are similarities between the graphs, it appears that men tend to win Oscars at slightly older ages than women. Check: OK

Section 2-4 – Basic Skills and Concepts

1) What is the main objective in graphing data?

The main objective of a graph is to visually depict data in a manner that emphasizes the key characteristics or features of the data. Check: OK The graph can also show the distribution, outliers, and so forth.

9) Use the heights (Data Set 11) to construct and stemplot. What does the stemplot suggest about the distribution of heights?

|Height of Eruptions of Old Faithful |

|Stems (Tens) |Leaves (Ones) |

| 9 | 55 |

|10 | |

|11 | 00055 |

|12 | 000000005555 |

|13 | 0000000066668 |

|14 | 000088 |

|15 | 00 |

The distribution of the eruption heights of Old Faithful appear to be approximately normal. Check: OK

17) Use the data to create a scatter diagram. In Data Set 3, use tar for the horizontal scale and use carbon monoxide (CO) for the vertical scale. Determine whether there appears to be a relationship between cigarette tar and CO. If so, describe the relationship.

[pic]

In general, it appears that as the amount of tar increases, the amount of carbon monoxide (CO) also increases. Check: OK

Unit 2 – Review Exercises

1) Construct a frequency distribution of the ages of the Oscar-winning actors listed in Table 2-1. Use the same class intervals that were for the actresses. How does the result compare to the frequency distribution for actresses?

|Frequency Distribution: Ages of Best Actors |

|Age of Actor |Frequency |

|21-30 |3 |

|31-40 |25 |

|41-50 |30 |

|51-60 |14 |

|61-70 |3 |

|71-80 |1 |

It appears that the distribution for the actors is centered at a value that is slightly higher than for actresses which means that males to win Oscars at older ages as compared to females. Check: OK

3) Construct a dotplot of the ages of the Oscar-winning actors listed in Table 2-1. How does the result compare to the dotplot for actresses?

[pic]

Although the shape of the distribution is similar to the dotplot for the actresses, the values for the males tend to be concentrated at a higher age. Check: OK

5) Refer to Table 2-1 and use only the first 10 ages of actresses and the first 10 ages of the actors. Construct a scatterplot. Based on the result, does there appear to be an association between the ages of actresses and the ages of actors?

[pic]

The points do not form any type of consistent pattern (i.e. a line, parabolic curve, etc.). Therefore, there does not appear to be an association between the ages of the two groups. Check: OK I needed to reference the answer in order to clarify how the variables were related.

Unit 2 – Cumulative Review Exercises

1) Consider the numbers that result from spins. Do those numbers measure or count anything?

No. These numbers are the values that are obtained. The number of times (the frequency) each value is spun is what is counted. Check: OK

3) Examine the distribution table. Given that the last class summarizes results from three slots, is its frequency approximately consistent with the results that would be expected from an unbiased roulette wheel? In general, do the frequencies suggest that the wheel is unbiased?

The other classes each represent five spaces on the wheel. Since the last class only involves three slots, this is 60% of the size of the other classes. If you divide 25 (the frequency) by 0.6, you get a value of 41.666666 or 42. This value seems to be relatively consistent with the other values.

Given the fact that only 380 spins were used, I would say that the roulette wheel is unbiased. In the long run, the values should even out and be consistent for all of the classes. Check: OK

Section 3-2 – Basic Skills and Concepts

1) In what sense are the mean, median, mode, and midrange measures of “center”?

Each of these measurements attempts to give an indication of the value that a distribution is centered around. The mean locates the center by dividing the sum of all values by the number of values in the distribution. The median is simply the middle number in an ordered list of values. The mode is the value that appears the most. The midrange is the average of the two extreme (maximum and minimum) values. Check: OK

9) Find the mean, median, mode, and midrange. Fourteen different second-year medical students at Bellevue Hospital measured the blood pressure of the same person. The systolic readings are listed. What is notable about this data set?

Mean: [pic] Check: OK

Median: 120 120 125 130 130 130 130 135 138 140 140 143 144 150

[pic] Check: OK

Mode: 130 Check: OK

Midrange:[pic] Check: OK

In general, all four measures of central tendency (mean, median, mode, and midrange) are fairly close. This would tend to indicate that the distribution is relatively symmetric. It is approximately normal. Check: When looking at values themselves, it is interesting that the values vary as much as they do since the blood pressure was taken on the same person. OK

17) Waiting times of customers at Jefferson Valley Bank (in one line) and the Bank of Providence (in three lines) are listed. Determine whether there is a difference between the two data sets that is not apparent from a comparison of the measures of center. If so, what is it?

In both sets of the data, the mean is 7.15 minutes, the median is 7.2, the mode is 7.7, and the midrange is 7.1. The measures of central tendency would indicate that the data sets are essentially the same. However, the wait times for Jefferson Valley do not vary as much as the wait times for the Bank of Providence. The range (difference between the maximum and minimum values) for Jefferson Valley is only 1.2 minutes whereas the range for Bank of Providence is 5.8 minutes. This means that persons waiting in line at Jefferson Valley can anticipate a consistent wait time, but persons at the Bank of Providence could have wait times that are short or long. Check: OK

Section 3-3 – Basic Skills and Concepts

1) Why is standard deviation considered a measure of variation? In your own words, describe the characteristic of a data set that is measured by the standard deviation.

Standard deviation measures variation because it is essentially an average of the differences between each value and the mean for the data set. The standard deviation gives an indication about number of (or percentage of) values in the data set that are within a certain range of the mean. Check: OK

5) Find the range, variance, and standard deviation for the given sample data. Answer the question: Statistics students participated in an experiment to test their ability to determine when 1 minute has passed. The results are given. Identify at least one good reason why the standard deviation from the sample might not be a good estimate of the standard deviation for the population of adults.

Range: [pic] Check: OK

SD: [pic]

[pic]

[pic]

Check: OK

Variance: [pic] Check: OK

The sample standard deviation is not a good approximate of the population standard deviation because it is a very small sample (n = 6). Check: OK

13) Find the range, variance, and standard deviation for the two data samples. Answer the question: Statistics are sometimes used to compare or identify authors of different works. The lengths of the first 20 words in the foreword written by Tennessee Williams in Cat on a Hot Tin Roof and the first 20 words in The Cat in the Hat by Dr. Suess are listed. Does there appear to be a different in variation?

Cat on a Hot Tin Roof Check: OK

Range: [pic]

SD: [pic]

[pic]

[pic]

Variance: [pic]

The Cat in the Hat Check: OK

Range: [pic]

SD: [pic]

[pic]

[pic]

Variance: [pic]

There does appear to be a difference in variation. The sample from The Cat in the Hat suggests that variation in word length is much smaller than the word length in Cat on a Hot Tin Roof. Check: OK

17) Find the range, variance, and standard deviation for the two data samples. Answer the question: Waiting times of customers at the Jefferson Valley Bank and the Bank of Providence are listed. Compare the variation in the two data sets.

Jefferson Valley Bank Check: OK I had to fix a minor calculation error.

Range: [pic]

SD: [pic]

[pic]

[pic]

Variance: [pic]

Bank of Providence Check: OK

Range: [pic]

SD: [pic]

[pic]

[pic]

Variance: [pic]

Due to the fact that the values are generally closer together, the variation in wait times for Jefferson Valley Bank is much lower than for the Bank of Providence. Check: OK

Section 3-4 – Basic Skills and Concepts

1) A value from a large data set is found to have a z-score of –2. Is the value above the mean or below the mean? How many standard deviations away from the mean is this value?

A value with a z-score of –2 is two standard deviations below the mean. Check: OK

3) For a large data set, the first quartile, Q1 is found to be 15. What does mean when we say that 15 is the first quartile?

This means that about 25% of the values in the data set are below 15. About 75% of the scores are above 15. Check: OK

9) Human body temperatures have a mean of 98.20 (F and a standard deviation of 0.62 (F. Convert the given temperatures to z-scores.

a) 97.50 (F

[pic] Check: OK

b) 98.60 (F

[pic] Check: OK

c) 98.20 (F

[pic] Check: OK

13) Which is relatively better: a score of 85 on a psychology test or a score of 45 on an economics test? Scores on the psychology test have a mean of 90 and a standard deviation of 10. Scores on the economics test have a mean of 55 and standard deviation of 5.

In order to compare the scores, they need to be converted to standard scores (z-scores).

[pic]

[pic]

Since the score on the psychology test is only half of a standard deviation below the mean, it is a better score than the score for the economics test which is two standard deviations below the mean. Check: OK I had to correct a minor calculation error on the z-score for the economics test.

Section 3-5 – Basic Skills and Concepts

1) Refer to the STATDISK-generated boxplot. What do the values of 2, 5, 10, 12, and 20 tell us about the data set from which the boxplot was constructed?

2 is the minimum value in the data set.

5 is the first quartile (Q1) of the data set.

10 is the median or second quartile (Q2) of the data set.

12 is the third quartile (Q3) of the data set.

20 is the maximum value in the data set. Check: OK

3) The two boxplots shown below correspond to the service times from two different companies that repair air conditioning units. They are shown on the same scale. The top boxplot corresponds to the Sigma Air Conditioning Company, and the bottom boxplot corresponds to the Newport Repair Company. Which company has less variation in repair times? Which company should have more predictable costs? Why?

The Sigma Company has less variation in repair time. The Sigma company should have more predictable costs because they can budget for and charge for labor in a more consistent manner. Due to the fact that there is less variation, the company can more reliably predict how long repairs will take. Check: OK

5) In 1908, Gosset published an article. He included the data listed below for two different types of corn seed that were used on adjacent plots of land. The listed values are the yields of head corn in pounds per acre. Using the yields from regular seed, find the 5-number summary and construct a boxplot.

The data can be organized as follows:

1316 1444 1511 1612 1903 1910 1935 1961 2060 2108 2496

Minimum: 1316

Maximum: 2496

First Quartile: 1511

Median: 1910

Third Quartile: 2060 Check: OK

The boxplot for this data would look like this:

[pic]

Unit 3 – Review Exercises

1) In a study of the relationship between heights and trunk diameters of trees, botany students collected sample data. Listed below are the tree circumferences. Using the circumferences, find the mean, median, mode, midrange, range, standard deviation, various, first quartile, third quartile, and tenth percentile.

Mean: [pic] Check: OK

Median: 1.8 1.8 1.9 2.4 3.1 3.4 3.7 3.7 3.8 3.9 4.0 4.1 4.9 5.1 5.1 5.2 5.3 5.5 8.3 13.7

[pic] Check: OK

Mode: Multimodal ( 1.8, 3.7, and 5.1 Check: OK

Midrange:[pic] Check: OK

Range: [pic] Check: OK

SD: [pic]

[pic]

[pic]

Check: OK

Variance: [pic] Check: OK

Q1: [pic]

Check: OK

Q3: [pic]

Check: OK

P10: [pic]

Check: OK

3) Using the same data set as question 1, construction a frequency distribution. Use seven classes with 1.0 as the lower limit, and use a class width of 2.0.

|Frequency Distribution: Tree Circumferences |

|Circumference |Frequency |

|1.0-2.9 |4 |

|3.0-4.9 |9 |

|5.0-6.9 |5 |

|7.0-8.9 |1 |

|9.0-10.9 |0 |

|11.0-12.9 |0 |

|13.0-14.9 |1 |

Check: OK

7) Using the same data set as question 1, construct a boxplot and identify the 5-number summary values.

Minimum: 1.8

Maximum: 13.7

First Quartile: 3.25

Median: 3.95

Third Quartile: 5.15 Check: OK

The boxplot for this data would look like this: Check: OK

[pic]

-----------------------

(

(

(

(

(

(

(

(

(

(

(

80

75

70

65

60

55

50

45

40

35

30

25

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

(

Ages of Best Actors Oscar Award

15.0

Boxplot for Tree Circumferences

13.7

5.15

3.95

3.25

1.8

13.0

11.0

9.0

7.0

5.0

3.0

1.0

1300

1500

1700

1900

2100

2300

2500

1316

1511

1910

2060

2496

Boxplot for Regular Corn Seed Yields

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download