TPS Chapter 1: Exploring Data
TPS Chapter 1: Exploring Data
|Q1. The science of data is known as ____. |A1. Statistics |
|Q2. Most raw data sets can be organized into rows and columns. Each row represents some|A2. Individuals and variables |
|object or person that is studied, and each column represents some characteristic about | |
|that thing that is measured. Our textbook calls those objects and characteristics what | |
|two things respectively? | |
|Q3. What are the two main classes of variable types? |A3. Categorical and quantitative |
|Q4. A description, depiction, or equation telling what values a variable takes on and |A4. Distribution |
|how often it takes on these values is called the ___ of the variable. | |
|Q5. Before studying the relationships among variables, it's usually good to begin by |A5. Each variable by itself |
|examining what? | |
|Q6. Before getting numerical summaries of the data, your textbook advises exploring the |A6. Graphs |
|data with what? | |
|Q7. What two types of graphs are usually most appropriate for categorical data? |A7. Bar charts and pie charts |
|Q8. If several percentages do not represent portions of the same whole, then what type |A8. A pie chart |
|of graph is inappropriate? | |
|Q9. When you are asked to describe a distribution after looking at a graph, the general |A9. Center, shape, and spread |
|tactic is to look for an overall pattern and also for striking deviations from that | |
|pattern. When describing the overall pattern, what three features should you mention? | |
|Q10. When you are asked to describe a distribution, the general tactic is to look for an|A10. Outliers |
|overall pattern and also for striking deviations from that pattern. What are the | |
|striking deviations called? | |
|Q11. Someone wants to display this center, shape, and spread of a data set with a |A11. A stem plot |
|picture. But the person also wants to communicate, through the same graph, the | |
|individual raw data values that were collected in the study. There are too many different| |
|values that the variable takes on to make a dot plot feasible. What type of graph should | |
|the person choose? | |
|Q12. Instead of a dot plot or a stem plot, a ____ is the most common graph of the |A12. Histogram |
|distribution of one quantitative variable. | |
|Q13. What does your textbook depict as a minimum number for either the number of stems |A13. Five |
|in a stem plot, or the number of classes in a histogram? | |
|Q14. If the right and left sides of a histogram are approximately mirror images of each |A14. Symmetric |
|other, we call the distribution what? | |
|Q15. If there's a big hump on the left side of a histogram and a long tail extending far|A15. Skewed right |
|out to the right, do we say that the distribution is skewed right or skewed left? | |
|Q16. If you look at people's incomes, defining income so that zero is the smallest |A16. Skewed right |
|possible value, and your sample includes mainly middle income people but at a few | |
|extremely high income people, will the distribution be skewed right or skewed left? | |
|Q17. Mary gets a test report saying that 79% of the test takers fell at or below the |A17. Percentile |
|score that she made. The name of the type of score she got is what? | |
|Q18. A relative cumulative frequency graph is often called what? |A18. Ogive. |
|Q19. In a relative cumulative frequency graph, or ogive, the horizontal axis is for the |A19. The fraction of observations less than or equal to|
|values of the variable you are looking at. For any given value on the horizontal axis, |that value |
|what does the value on the vertical axis stand for? | |
|Q20. If you are given a relative cumulative frequency graph, and someone asks you to |A20. Find the value on the x-axis that has a 50% or .5 |
|find the center of the distribution, how do you do it? |value on the y-axis. |
|Q21. On a time plot, what axis does time go on? |A21. The horizontal axis |
|Q22. On a time plot, an overall upward or downward slope is called what? |A22. A trend |
|Q23. On a time plot, what do you call the shorter-term variations that occur regularly, |A23. Seasonal variation |
|repeating themselves in a cyclic fashion? | |
|Q24. 1/n times the summation of the x(i), where n is the number of cases and x(i) is the |A24. The mean |
|value of the ith case, is known as what? | |
|Q25. The number in a distribution such than half the observations are smaller and the |A25. The median |
|other half are larger is called what? | |
|Q26. If there is no middle value in a data set because you have an even number of |A26. You find the mean of the two center observations. |
|cases, how do you do find the median then? | |
|Q27. Between the mean and median, which of these is pulled farther in the direction of |A27. The mean. |
|extreme values or outliers? | |
|Q28. If a distribution is highly skewed to the right, which value will be lower: the |A28. The median. |
|mean, or the median? | |
|Q29. From which statistic, the mean or the median, can you recover the total value of |A29. The mean |
|all the cases in your data set, if you know how many cases there are? | |
|Q30. What's the definition of the range of a distribution? |A30. The difference between the largest and smallest |
| |value |
|Q31. What's the chief problem with using the range as a measure of the spread of a |A31. It's too sensitive to outliers, and it depends on |
|distribution? |only two values in the data set. |
|Q32. What you call the median of the subset of observations whose position in the |A32. The first quartile |
|ordered list is to the left of the overall median? | |
|Q33. What's the definition of the interquartile range? |A33. The third quartile minus the first quartile. |
|Q34. What's the rule of thumb for defining outliers in terms of the interquartile range?|A34. An outlier falls more than 1.5 times the |
| |interquartile range above the third quartile or below |
| |the first quartile. |
|Q35. What five numbers are in the so-called five number summary? |A35. The minimum, the first quartile, the median, the |
| |third quartile, and the maximum. |
|Q36. What type of graph gives a picture of the five number summary? |A36. The box plot |
|Q37. What's the difference between a regular box plot and a modified box plot? |A37. In a regular box plot, the whiskers go out to the |
| |maximum and minimum. In a modified box plot, the |
| |whiskers go out to the largest and smallest data points|
| |that are not outliers. The outliers are plotted as |
| |isolated points on a modified box plot. |
|Q38. If you take the deviation of each observation from the mean of the whole set, |A38. The variance |
|square those deviations, add those squares, and divide by one less than the number of | |
|observations, what do you call the resulting number? | |
|Q39. What is the relationship between the variance and the standard deviation? |A39. The standard deviation is the square root of the |
| |variance. |
|Q40. How is the standard deviation like the interquartile range? |A40. Both of them are measures of spread of the |
| |distribution. |
|Q41. When you average the squared deviations from the mean to find the variance of a |A41. The degrees of freedom |
|sample, what should you divide by: the n of cases, or the "degrees of freedom"? | |
|Q42. Under what conditions will a standard deviation equal zero? |A42. When all the observations have the same value. |
|Q43. Between the interquartile range and the standard deviation, which is more resistant|A43. The interquartile range |
|to the effects of the outliers? | |
|Q44. How do you choose between the five number summary on the one hand, and the mean and|A44. The mean and standard deviation are good for |
|standard deviation, on the other hand, as ways of describing a distribution? |reasonably symmetric distributions that are free of |
| |outliers. Otherwise the five number summary is usually|
| |better. |
|Q45. If you add the same number to each observation, how does that affect the center and|A45. The number that you add is added to the measures |
|the spread of the distribution? |of center, such as the mean and median. But measures of|
| |spread, such as the interquartile range and standard |
| |deviation, are not affected. |
|Q46. If you multiply each observation by the same number, how does that affect measures |A46. Both the measures of center (median and mean) and |
|of center and spread? |the measures of spread (standard deviation and |
| |interquartile range) are multiplied by the same number.|
| |(The variance, which is also a measure of spread, is |
| |multiplied by the square of the number each observation|
| |is multiplied by.) |
|Q47. What are three graphical methods of comparing distributions? |A47. Side by side bar graphs, back-to-back stem plots, |
| |and side-by-side box plots. |
TPS Chapter 2: The Normal Distribution
|Q1. The scales of density curves are adjusted so that the total area under each curve is |A1. One |
|what? | |
|Q2. The area under the density curves between a couple of x-axis values represents what? |A2. The proportion of all observations that fall |
| |between those values. |
|Q3. Do measures of center and spread apply a to density curve as well as to sets of |A3. Yes |
|observations? | |
|Q4. How do you define the median of the density curve? |A4. The point with half the area under the curve to its|
| |left and the remaining half of the area to its right. |
|Q5. The quartiles of a density curve divide the area into what? |A5. Four equal parts. |
|Q6. What is the relationship between the mean and the median of a symmetric density |A6. They are equal. |
|curve? | |
|Q7. Which is pulled the farther toward the tail of a skewed distribution: the median, or |A7. The mean |
|the mean? | |
|Q8. In conventional notation, what are the meanings of x-bar and s, as contrasted to mu |A8. The first two refer to the mean and standard |
|and sigma? |deviation, respectively, of a set of observations, a |
| |sample. The second two refer to the mean and standard |
| |deviation, respectively, of a density curve or |
| |idealized distribution, or the population distribution.|
|Q9. What three features describe the overall shape of normal curve? |A9. Normal curves are symmetric, single peaked, and |
| |bell shaped. |
|Q10. Is there only one normal curve, or is there an infinite number of normal curves? |A10. An infinite number. |
|Q11. For any given mean and standard deviation, is there only one normal curve, or an |A11. Only one. |
|infinite number of normal curves? | |
|Q12. How can you visually find the points one standard deviation from the mean of a |A12. Those points are the inflection points of the |
|normal curve? |curve. That is, the curve changes from falling more |
| |and more steeply to falling less and less steeply, or |
| |vice versa. (Optional answer for calculus lovers: they |
| |are points where the second derivative of the curve |
| |equals zero.) |
|Q13. The distributions of test scores, of measures of characteristics of living things, |A13. The normal distribution |
|and of summary statistics for chance outcomes repeated many times, often (but not | |
|always!) follow what type of distribution? | |
|Q14. What three percentages do you have to remember when you are stating the “empirical |A14. 68%, and 95%, and 99.7%. |
|rule”? | |
|Q15. Are the three percentages for 1, 2, and 3 standard deviations exact, or |A15. Approximations. |
|easier-to-remember rounded approximations? | |
|Q16. What do the three percentages in the empirical rule apply to? in other words, what |A16. The three numbers tell the per cent observations |
|is the meaning of this rule? |falling within the region plus or minus 1, 2, or 3 |
| |standard deviations from the mean, respectively, in a |
| |normal curve. (Note that the percents refer to the |
| |percent of observations encompassed by the interval |
| |from that number of standard deviations below the mean |
| |to that number above the mean.) |
|Q17. True or false: If Mary scores one standard deviation above the mean on a normally |A17. True |
|distributed test, then approximately 68% of the test takers scored as close to the mean | |
|of the test as, or closer to the mean than, Mary did. | |
|Q18. True or false: If Mary scores one standard deviation above the mean a on a normally |A18. False |
|distributed test, her score is in the 68th percentile. | |
|Q19. True or false: if Mary scores one standard deviation above the mean on a normally |A19. True |
|distributed test, half of 68% or 34% are above the mean but at or below Mary’s score. An| |
|additional 50% are below the mean. Thus Mary equals or surpasses 50% plus 34% of the | |
|test takers, and is at the 84th percentile. | |
|Q20. What does the notation N(100,15) mean? |A20. It denotes a normal in distribution with mean it |
| |100 and standard deviation 15. |
|Q21. True or false: the standard score for any observation tells how many standard |A21. True |
|deviations that score is from the mean. | |
|Q22. What two operations do we do, to standardize a score? |A22. Subtract the mean and divide by the standard |
| |deviation. |
|Q23. A standard score is often called by what other term? |A23. The z-score. |
|Q24. What does the sign of a standard score correspond to? |A24. If the z-score is positive, it’s above the mean, |
| |and if negative, below the mean. |
|Q25. Are there an infinite number of standard and normal curves, each with its own |A25. There is just one standard normal curve, with only|
|equation describing it, or just one standard normal curve, with just one equation |one equation and describing it. |
|describing it? | |
|Q26. In a table of areas under the standard normal curve, what does the table entry for |A26. The area under the curve to the left of z, or in |
|each z score represent? |other words, the proportion of cases with values less |
| |than z. |
|Q27. What steps do you follow to use the z table solve the following “problem prototype”:|A27. First we standardize x (by subtracting mu and |
|given N(mu, sigma), please find the proportion of observations less than x? |dividing by sigma). Then we look at the z table to find|
| |the proportion of the distribution less than the z |
| |score we’ve obtained. |
|Q28. What steps do you follow with a z table if you want to know what proportion of the |A28. Look up the proportion less than the first, and |
|scores are between two values? |less than the second, and find the difference between |
| |the two proportions. |
|Q29. What two pictures do TPS recommend drawing when solving problems where you are asked|A29. They recommend drawing unstandardized and |
|to solve problem where you are given normal curves and asked for proportions of the |standardized normal curves, and shading in the areas |
|observations? |that are asked for. |
|Q30. What do the authors recommend (as a word to the wise for future test-takers) as the |A30. They recommend stating the conclusion in the |
|last step of problems giving a normal distribution and asking for proportions of |context of the problem. Thus rather than just saying, |
|observations? |the answer is 49%, you would say, “About 49% of boys |
| |have cholesterol levels between 170 and 240 mg/dl.” |
|Q31. What steps do you go through when you want to find a value given a proportion of a |A31. You look for the proportion in the body of the |
|normal distribution, using the z-table? |table, and you find at the margin the z-score that |
| |corresponds to it. Then you “unstandardize” the |
| |z-score. |
|Q32. What operations do you do, in what order, to “unstandardize” a z-score, or turn the |A32. You multiply the z score by the standard |
|z-score into a raw score? |deviation, and then you add it to the mean. |
|Q33. Suppose you have a data set, and you want to see if it is approximately normally |A33. Make a frequency histogram or stemplot, and see if|
|distributed. What’s the first thing to do, before doing calculations? |the curve looks bell-shaped and symmetric. |
|Q34. What’s a way of checking a data set for normality, using the empirical rule? |A34. Find the mean and sd of the data set, and count |
| |(or get a computer to count) the proportions of |
| |observations that are within 1, 2, and 3 standard |
| |deviations of the mean. See if these proportions |
| |correspond, roughly, to .68, .95, and .997. |
|Q35. True or false: The point of making a normal probability plot is to see whether a set|A35. True |
|of numbers is normally distributed. | |
|Q36. If you were to take any set of numbers, and plot the numbers on the x-axis, and |A36. A line. |
|their z-scores on the y-axis, you would get what shape for your graph? (Hint1: y values | |
|are (1/sd)*x + mean/sd, or of the form y=mx +b) (Hint 2: Standardizing a score involves a| |
|linear transformation.) | |
|Q37. On a normal probability plot for a set of observations, what goes on the x axis? |A37. The values of the observations themselves. |
|Q38. True or false: On a normal probability plot, what goes on the y-axis for each x |A38. True. |
|value is the z score that would be associated with the percentile for that value, | |
|assuming a normal distribution (and using midpoints of intervals in finding percentiles)?| |
|Q39. What conclusion do you come to if a normal probability plot is not linear? |A39. That the data are not normally distributed. |
TPS Chapter 3: Examining Relationships
|Q1. Suppose that a researcher wants to study the effect of people’s ever having taken the|A1. Ingestion of ecstacy is the explanatory variable |
|drug "ecstasy" upon the people’s memory scores when tested. Which of these is the |and memory test scores are the response variable. |
|response variable and which is the explanatory variable? | |
|Q2. How do the terms "dependent variable" and "independent variable" correspond to the |A2. Dependent corresponds to response, and independent |
|terms “response variable” and “explanatory variable”? |corresponds to explanatory. |
|Q3. Is it proper to use the terms, response variable and explanatory variable, if the |A3. Yes. No implication of causation is contained in |
|explanatory variable does not actually cause the response variable? |the terms explanatory and response (or independent and |
| |dependent). |
|Q4. What's the order of tasks involved in the examining relationships between two |A4. First plot the data, then use numerical summaries. |
|variables? |Look for the overall pattern and deviations from that |
| |pattern, and when the overall pattern is regular, use a|
| |mathematical model to describe it. |
|Q5. Suppose that someone has math scores for the children in one classroom, and English |A5. It doesn't make sense to use a scatterplot in this |
|scores for a second set of children in another classroom. The person asks you about |situation, because a scatterplot involves graphing two |
|making a scatterplot for these data. What would you reply? |variables measured upon the same individuals. |
|Q6. True or false: in a scatterplot, each point represents one individual; the |A6. True |
|x-coordinate of the point represents the value of one variable and the y-coordinate | |
|represents the value of another variable measured on that same individual. | |
|Q7. If there is an explanatory variable, which axis should it be graphed on? |A7. The x-axis |
|Q8. When describing a scatterplot, what three aspects of the pattern should you refer to?|A8. The form, the direction, and the strength of the |
| |relationship. |
|Q9. True or false: in describing the form of a scatterplot, it's important to say whether|A9. True |
|the graph appears to be linear or not. | |
|Q10. In describing the form of a scatterplot, what term do you use if the values tend to |A10. You say that there are clusters. |
|fall into two or more groups that are separated from one another by gaps? | |
|Q11. In describing the direction of a scatterplot, when there is a positive or negative |A11. Positively or negatively associated. |
|slope, we say that the variables are positively or negatively what? | |
|Q12. When any given x value on a scatterplot has vary widely varying y values associated |A12. strong |
|with it: the more widely varying the y values, the less _____ is the relationship between| |
|the two variables. | |
|Q13. When you are drawing a scatterplot, what symbols should you use in showing the axes |A13. You use a symbol that looks like two slashes to |
|if the origin of the graph is not at zero? |indicate a break in the scale. |
|Q14. What are about 3 other guidelines on how to draw scatterplots properly? |A14. Make the intervals uniform. Label both axes. |
| |Choose a scale that makes your graph big enough. |
|Q15. Suppose that you want your scatterplot to reflect the influence of a particular |A15. Use a different symbol on the scatterplot for the |
|categorical variable, in addition to the relationship of the two quantitative variables |points designating males as for those designating |
|that are plotted. For example, suppose you want to graph the relation between |females. |
|entertainment violence and real-life violence for males and females on the same graph, in| |
|such a way that displays the relationship separately for males and females. What should | |
|you do? | |
|Q16. A common problem in constructing a scatterplot occurs when two or more individuals |A16. Use a different plotting symbol to call attention |
|have exactly the same values for each of the two variables. What should you do in that |to those points. |
|case? | |
|Q17. Which is a better method for judging the strength of a linear relationship: simply |A17. A calculated statistic works better, because our |
|to look at a graph, or to use a calculated numerical statistic that summarizes the |eyes can be deceived by a different scaling methods |
|strength of the linear relationship? Why do you think your chosen method is better? |used in graphs. |
|Q18. What is the summary statistic that measures the strength of a linear relationship? |A18. The correlation coefficient. |
|Q19. We’ve used Greek letter mu to represent a population mean; x-bar to represent a |A19. r |
|sample mean; Greek sigma to represent the standard deviation, and s to represent the | |
|sample standard deviation. What letter does out book use to designate what it calls the | |
|correlation? | |
|Q20. Given that the letter r, for the correlation coefficient, is in our own alphabet and|A20. sample statistic |
|not the Greek alphabet, do you think it refers to a sample statistic or a population | |
|parameter? | |
|Q21. Would you guess that there is some other Greek letter that refers to the population |A21. Yes. (It’s the letter rho, which looks pretty |
|value of the correlation coefficient? |much like a p!) |
|Q22. When you look at the formula for the sample correlation coefficient that your text |A22. These are the standard scores, or z-scores, for |
|gives, you see (xi-xbar)/sx and (yi-ybar)/sy. Can you give a simpler name to these |the ith individual. The first factor is the z-score |
|expressions? |for the x variable and the second is the z-score for |
| |the y variable. |
|Q23. What is the meaning of a positive and negative sign associated with the correlation |A23. A positive sign means there’s a positive |
|coefficient? |association between the variables; in other words, |
| |higher values of one are associated with higher values |
| |of the other. A negative sign means there’s a negative |
| |association; that is, higher values of one variable are|
| |associated with lower values of the other. |
|Q24. Suppose one person calculates the correlation of IQ score of some individuals with |A24. The same correlation. The correlation coefficient |
|number of boxing matches fought, testing the hypothesis that boxing (the explanatory |is not affected by which variable is considered |
|variable) affects IQ (the response variable). A second person, using the same data set, |explanatory and which is considered response. |
|also calculates the correlation of the number of fights with IQ score, only this person | |
|thinks of IQ as the explanatory variable and number of fights as the response variable. | |
|Do they get the same correlation, or different ones? | |
|Q25. Suppose someone codes race as follows: 0=Caucasian, 1=African American, 2=Asian, |A25. The problem is that the correlation coefficient is|
|3=Hispanic, 4=American Indian 5=Other. Then someone calculates a correlation between race|to be used with quantitative variables, not categorical|
|and a reading test score for a sample of kids. Do you have a problem with this? If so, |variables like this. The obtained correlation would be |
|what’s your problem? |meaningless, and an artifact of the arbitrary coding |
| |system. |
|Q26. Melinda computes a correlation between the height of mothers and their daughters. |A26. Melinda did not blow it, because the correlation |
|Lunk is looking at the computations, and says, “You blew it! You have the height of |coefficient comes out the same no matter what units are|
|mothers measured in centimeters, and the height of the daughters measured in inches!” |used. (This is because a transformation from one unit |
|Please tell whether Melinda needs to do anything to fix her correlation coefficient, and |to another (which involves multiplying each number in |
|if so, what? |the data set by the same number) multiplies both the |
| |mean and the standard deviation of the data set by the |
| |same number (as was learned in chapter 1). The |
| |z-score, which is (xi-xbar)/sx comes out the same, |
| |because each of the three numbers that make up the |
| |z-score is multiplied by the same factor, and that |
| |factor cancels out. Since the z-scores are not affected|
| |by changes of units, the correlation coefficient is |
| |also not affected.) |
|Q27. What range of values is possible for the correlation coefficient? |A27. -1 to +1. |
|Q28. What sort of correlation coefficient do you find when two variables have a very |A28. A correlation close to –1. |
|strong linear relationship, and when as the first gets greater, the second gets smaller? | |
|Q29. Suppose the data points are two variables collected for all the days of 2005. For |A29. You’d guess a correlation of about 0, since there |
|each of those days, imagine that we know (variable 1) the number of words your instructor|is no reason to expect that these two variables would |
|for this course spoke in that day, and (variable 2) the peak barometric pressure for that|rise and fall in association with each other. |
|day in Caracas, Venezuela. About what would you guess the correlation between these two | |
|variables to be? Why? | |
|Q30. Suppose there are two variables which, when graphed in a scatterplot, form an almost|A30. No, because the correlation coefficient measures |
|perfect u-shaped parabola. Would the strong relationship between these variables imply a |the strength of linear relationships only, not |
|high correlation coefficient (meaning one close to 1)? Why or why not? |curvilinear relationships. A u-shaped curve isn’t a |
| |straight line! |
|Q31. Does the correlation coefficient resemble the median and interquartile range in |A31. Like the mean and sd, the correlation coefficient |
|being fairly resistant to outliers, or resemble the mean and standard deviation in being |can be greatly influenced by outliers. |
|heavily influenced by outliers? | |
|Q32. Someone practices guessing correlation coefficients from scatterplots using an |A32. Because the scales of the variables are not |
|“applet” on the internet. Why should the person not get too confident of his or her |necessarily the same as they were on the applet, and |
|guessing power given scatterplots of real-life data? |scales can throw off “eyeball” estimates. |
|Q33. In attempting to give a more complete description of a set of data involving two |A33. The mean and sd, because the formula for the |
|variables, someone wants to give a measure of center and spread as well as measure of the|correlation uses the mean and sd. |
|correlation coefficient. Assuming the person has made a good decision to use the | |
|correlation coefficient, what measures of center and spread would be most consistent with| |
|the correlation coefficient: the mean and sd or the median and IQR? | |
|Q34. The women in a corporation think that they are being discriminated against in their |A34. It’s not valid. The correlation coefficient |
|salaries. A management spokesman says to them, “Look at this plot. The first data point |measures the predictability of one score from another, |
|is the average salary for men who have worked here 1 year, put into an ordered pair with |not the equality of the two scores. Adding same value |
|the salary for women who have worked here one year. The second ordered pair is the |to all values of either x or y does not change the |
|average salary for men and women with two year’s experience, and so forth. The |correlation – the same with subtraction, |
|correlation between men’s salaries and women’s salaries is .95! That’s almost a perfect |multiplication, or division. So the salaries of women |
|correlation! You women have nothing to complain about!” Is this argument valid? Why or |could be half those of the comparable salaries of men, |
|why not? |or $10,000 less than the comparable salaries of men, |
| |and you could still get a high correlation. |
|Q35. Suppose that you have a data set with a correlation fairly close to 0. All the |A35. The correlation will become close to 1, because |
|numbers for both variables are between 0 and 10. There are about 10 individuals in the |this one outlier has such a strong effect. |
|data set. Then suppose that one more individual gets added, an outlier with a value of | |
|100, 100. What do you think the correlation coefficient will become? (Try it out with | |
|your calculator or minitab if you want, or mimic this situation on an “applet.”) | |
|Q36. True or false: In a regression line, like a correlation coefficient, you get the |A36. False. The change in y per unit change in x, for |
|same numbers (slopes and intercepts) no matter which variable is considered the |example, is not the same as the change in x per unit |
|explanatory variable and which is considered the response variable. |change in y. |
|Q37. Please explain, for a least squares regression line: the sum of the squares of what |A37. The squares of the errors for each data point, |
|are being minimized? |where the errors are the distances from the data point |
| |to the regression line. (The word residuals is also |
| |correct.) |
|Q38. Please explain why the distance from the data point to the regression line |A38. The regression line gives predicted values of y |
|corresponds to the idea of an “error.” |(called y-hat) for each x. There is also an actual |
| |observed value y for each x, for each data point. The |
| |difference between the actual and the predicted value |
| |is the “error” in prediction that is made by using the |
| |regression line to predict the response variable. |
|Q39. What’s the formula for the slope of a regression line, in terms of the correlation |A39. b=r (sy/sx). |
|between the two variables and their standard deviations? (Call the slope b, the | |
|correlation r, and the two sd’s sx and sy.) | |
|Q40. Every least-squares regression line passes through what point? |A40. It passes through xbar, ybar, the ordered pair |
| |formed by the means of both variables. |
|Q41. Once you know the slope of a regression line, how would you find the y-intercept, |A41. a=ybar- b*xbar. You get this by just solving for |
|knowing the means of the x values and the y values? (call the intercept a, and the means |a the equation ybar=b*xbar + a. And the second equation|
|for x and y xbar and ybar.) |comes from the fact that ybar is always the predicted |
| |value of y for xbar. |
|Q42. When you have a regression equation delivered by the computer software output, and |A42. Just substitute the value of x into the equation |
|someone asks you for the predicted value of y given a certain x value, what do you do? |and solve for the predicted y value. |
|Q43. Suppose that someone measures height as a function of weight for a bunch of human |A43. Because the y-intercept corresponds to the height |
|adults, and gets a regression equation predicting height as a function of weight. Why is |of someone with weight 0. But the weight of 0 is far |
|the y-intercept of the equation not as meaningful or important as the slope, or as the |outside the range of weights measured in the study and |
|equation as a whole? |thus the height predicted will be an extrapolation. |
| |Secondly, the weight of 0 is one seldom found in human |
| |beings, (at least those who have already been born and |
| |aren’t dead yet)! |
|Q44. Suppose you have a regression equation output from a computer and you are asked to |A44. Just pick two values of x, and calculate the yhat |
|plot the line by hand. How would you do it? |values for each, and connect those two dots. It helps |
| |if you pick points that are close to the bottom and top|
| |ends of the range. (One easy point is the y-intercept.)|
|Q45. When, in the context of regression, people speak about the SST (sum of squares |A45. The sum of the squared deviations of each y value |
|total), what do they mean by that? |from the mean of the y values. |
|Q46. When in the context of regression people speak of the SSE or sum of squares for |A46. They mean the sum of the squares of the deviations|
|error, what do they mean by that? |of the actual y values from the predicted y values. |
| |(These deviations are also called residuals.) |
|Q47. Your book doesn’t define very explicitly in this chapter what the sum of squares for|A47. Yes. |
|regression is. Do you think it would be reasonable to think of that as the sum of all the| |
|squared deviations of the predicted y values (the y-hats) from the mean of the y values? | |
|Particularly if a trustworthy source hinted that it was? (P.S. you can calculate the | |
|SSReg in your head, easily, for the 3-point data set of example 3.11 on page 160: it | |
|comes out to 32. The SST comes out to 38, and the SSE to 6.) | |
|Q48. The book speaks of the sum of squares for the regression as the SST-SSE, or the sum |A48. Yes. |
|of squares total minus the sum of squares for error. Can we infer from this that the | |
|total sum of squares, SST, can be partitioned into the SSReg (sum of squares for | |
|regression) and the SSE, (sum of squares for error), and that SST=SSReg+SSE? (P.S. I use | |
|the notation SSReg so as not to confuse sum of squares for regression with sum of squares| |
|for residuals.) | |
|Q49. The square of the correlation coefficient, or r-squared, a.k.a. the coefficient of |A49. r-squared = the SSReg/SST or (SST-SSE)/SST. The |
|determination, means what in terms of the fraction of the total sum of squares? Please |r-square is the fraction of the total sum of squares |
|answer in symbols and words. |that is accounted for by the regression of y on x. |
|Q50. One person studies IQ as a function of number of boxing matches participated in, and|A50. No. The first slope tells how many points IQ |
|another uses the same data set to study boxing matches participated in as a function of |changes per additional boxing match, and the second |
|IQ. (That is, matches is the explanatory variable in the first study and IQ is the |slope tells how many fewer boxing matches someone has |
|explanatory variable in the second.) Do they both get the same value for the slope of the|for each additional IQ point. |
|regression line? Can you explain in words the reason for this answer? (You may assume a | |
|negative relationship between the two variables in constructing language for your | |
|answer.) | |
|Q51. The slope of the regression line b is equal to r*(sy/sx). Along the regression |A51. r standard deviations of y, or r*sy. |
|line, a change in 1 standard deviation in x results in a change of how many standard | |
|deviations of y? (Hint: the slope is the change in y over the change in x. So the change | |
|in y equals the slope times the change in x. So if the change in x is sx, we get | |
|r*(sy/sx)*sx, which equals... | |
|Q52. True or false: the slope of the regression line tells you how many unstandardized |A52. True. |
|units the predicted value of y changes for each unstandardized unit change in x. | |
|Q53. True or false: the correlation coefficient tells you how many standard deviations |A53. True. |
|the predicted y changes for each standard deviation change in x. | |
|Q54. True or false: If both of two variables y and x are standardized, (so that the |A54. True. |
|standard deviation of both is 1) then the slope of the regression line and the | |
|correlation coefficient are equal. | |
|Q55. What is another name for y – yhat, or the deviation of the observed y value from the|A55. The residual. |
|predicted value, or the error in prediction for a given value, or the vertical distance | |
|between any data point and the regression line? | |
|Q56. True or false: when we speak of a “least squares” regression line, we mean that we |A56. True. |
|choose the line so as to minimize the squares of the residuals. | |
|Q57. Someone draws a graph of residuals (on the y axis) versus the values of the |A57. A residual plot. |
|explanatory variable. This graph is called what? | |
|Q58. Someone draws a residual plot, and all the values are positive. Someone says to that|A58. Because the mean of the least-squares residuals is|
|person, “There must be some mistake.” Why did the person say that? |always zero; thus if you have positive values you have |
| |to have at least one negative value. |
|Q59. If the linear regression equation fits the data well, what do you see on the |A59. A uniform scatter of points, without a clear |
|residual plot? |pattern, and with no unusual individual observations. |
|Q60. What do you call a data point that has a big effect on the slope or intercept of the|A60. An influential point. |
|regression line? | |
|Q61. Does an influential point necessarily have a large residual? (Hint: the influential |A61. No. |
|point can pull the line close to it.) | |
TPS Chapter 4: More on Two-Variable Data
|Q1. In the example at the beginning of this chapter, a plot of the log of brain weight as|A1. A better fit means that the data are more linear – |
|a function of the log of body weight provides a “better fit” for the observed data than a|a linear model is more successful in describing the |
|simple plot of brain weight as a function of body weight. What is meant by better fit? |relationship, the correlation coefficient is higher, |
|(This is also the answer to the question, what are we trying to do (at least in this |the sum of squares of residuals for a regression line |
|course) when we transform data?) |is lower. |
|Q2. Can you figure out why we would want to transform data so as to get a more linear |A2. Because then we can use the methods of correlation |
|relationship? |and regression that were studied in the previous |
| |chapter and will be studied more in future chapters, |
| |which are very powerful methods of statistical |
| |analysis. |
|Q3. True or false: if we have a curvilinear function, and we want to straighten it out to|A3. True. Linear transformations don’t straighten |
|make a linear function, we can’t do that by multiplying or dividing by constants or |curves. |
|adding or subtracting constants (i.e. by using linear transformations). | |
|Q4. What are the transformations that are most commonly used, other than linear |A4. Positive and negative powers, and logarithms. |
|transformation? | |
|Q5. What is the definition of a monotonic function? |A5. It’s one where as x increases, y always increases |
| |(a monotonic increasing function) or as x increases, y |
| |always decreases (a monotonic decreasing function). It |
| |produces a graph that doesn’t go up and dip down, but |
| |consistently has either a positive or negative slope. |
|Q6. Is it kosher to speak of a function as being, for example, monotonic increasing over |A6. Yes. An example is y=x2, which is monotonic |
|part of the domain of x, and monotonic decreasing over another part? If so, can you give |decreasing for negative values of x, and monotonic |
|an example? |increasing for positive values of x. |
|Q7. True or false: There are often two steps in transformation. The second is to apply a |A7. True. |
|power or logarithmic function that simplifies the data. The first is to use a linear | |
|transformation, such as adding a constant, that makes the values all positive, so that | |
|the function applied in the second step will be defined and monotonic increasing. | |
|Q8. How is the ladder of power functions useful? |A8. When we are trying to straighten out curved data |
| |sets, we can go in one direction or the other along the|
| |ladder, seeing how straight the line becomes, rather |
| |than just randomly picking different functions. |
|Q9. Linear growth is to adding a fixed amount per unit time as exponential growth is to |A9. Multiplying. |
|______ by a fixed amount per unit time. | |
|Q10. If the number of a certain type of bacteria doubles every two hours, is that linear|A10. Exponential. |
|growth or exponential growth? | |
|Q11. Increasing everyone’s salary by a certain percentage is to ______ growth as |A11. Exponential, linear. |
|increasing everyone’s salary by the same dollar amount is to _______ growth. | |
|Q12. Suppose we have a function y=ab^x, where a and b are constants and x is the |A12. An exponential function. (This is a function like |
|explanatory or independent variable, and y is the response or dependent variable. Is this|y=2^x.) |
|an example of an exponential function, or a power function? | |
|Q13. Suppose we have a function y=ax^b, where a and b are constants and x is the |A13. A power function. (This is a function like |
|explanatory variable and y is the response variable. Is this an example of an exponential|y=x^2.) |
|function, or a power function? | |
|Q14. If y is an exponential function of x, plotting what function of y versus x should |A14. The log of y versus x. |
|result in a linear graph? | |
|Q15. Suppose you do a regression of the log (base 10) of y versus x, and you get a nice |A15. You’d just use your equation to find the predicted|
|linear scatterplot and a high coefficient of determination (r^2) when you do a |value of log y. Then you take the antilog (or 10 to |
|regression. Now you can use this linear relationship for prediction. Suppose someone |that number) to get the predicted value of y. In other |
|(like a test-maker) asks you what the predicted value is of y (not log y) for a given |words, you “untransform” the value back to the original|
|value of x. How would you find it? |scale. |
|Q16. If a variable grows exponentially, its logarithm grows how? |A16. Linearly. |
|Q17. To make an exponential function linear, we use the log transformation just with the |A17. Both the explanatory and the response variable. |
|response variable y. To make a power function linear, we use the log transformation with | |
|what? | |
|Q18. If you start with the power function y=ax^p, and take the log of both sides, what |A18. log y=log a +p log x. |
|result do you end up with? | |
|Q19. Suppose you have a data set, and its scatterplot is curved. Then you take the log of|A19. That the original variables were related according|
|both explanatory and response variables, and plot them, and you get a line. What do you |to a power function (or power law). |
|infer from this? | |
|Q20. When you plot the log of y vs. the log of x, do you give any meaningful |A20. According to the equation log y =log a + p log x, |
|interpretation to the slope of the line that you get? If so, what is it? |the slope of the line is the power to which x is raised|
| |in the original power function. |
|Q21. Jane gets a regression coefficient (i.e. a slope) of 3.617 when regressing log y vs.|A21. Yes. The slope you obtain in any given experiment |
|log x. She says, “Now I know that x and y are related in a power function, and y= a |is an estimate of the population value of the slope, |
|constant times x to the 3.617 power.” Do you think this conclusion should be tempered or |and not an exact rendering of it. Another sample is |
|qualified? If so, how? |very likely to give a different slope, and it could |
| |possibly even lead to a different conclusion about the |
| |form of the functional relationship! |
|Q22. Suppose you plot the log y vs. the log x and you get a good line, with intercept 2 |A22. You just take the antilog of both sides. You get |
|and slope 3. So log y=2+3log x. Now you are asked to find the equation for y in terms of|y= 10^(2+3 log x), or y=10^2*(10^log x)^3, or |
|x, without logs in it. How do you do this? |y=100*x^3. That is, y=100 times x cubed. |
|Q23. Suppose I find that in the range of 3 to 7 milligrams of Ritalin given to a group of|A23. Extrapolation, which is using the regression |
|children, their math scores rise in linear fashion with increasing dose. A parent looks |equation to make predictions for values of the |
|at the regression equation and says, “By my calculations, all it would take would be 400 |explanatory variable that we have no experience with. |
|milligrams of Ritalin for my child to get an 5 on the AP statistics course, while he’s | |
|still in 3rd grade.” What do we call this type of reasoning (which often leads to | |
|incorrect conclusions)? | |
|Q24. I notice a linear relationship between shoe size and basketball prowess, and propose|A24. A lurking variable. |
|to help my daughter’s fifth grade basketball team win by dressing them all in size 15 | |
|Converse All-Stars. Someone says, “But in your study, there’s another variable, namely | |
|height, that was not among your explanatory or response variables and yet may influence | |
|the interpretation of the relationship between shoe size and basketball skill.” What type| |
|of variable is height, in this situation? | |
|Q25. In the example you just read, a lurking variable enhanced the apparent association |A25. Yes. The relation of overcrowding and lack of |
|between two variables. Can lurking variables also mask or attenuate the apparent |indoor toilets on page 227 of TPS is an example. Here’s|
|association between two variables? If so, can you think of an example? |another: suppose a researcher finds that the |
| |correlation between violent entertainment viewed and |
| |aggressive behavior in real life is low. Then someone |
| |reanalyzes the data and finds that when you make |
| |separate scatterplots for males and females, you get a |
| |strong relationship. Males are much more aggressive |
| |than females, and when you throw the data together on |
| |one scatterplot you get more of a horizontal line than |
| |a sloped one. Thus the lurking variable of gender |
| |obscured the real association that was there. |
|Q26. What’s one way of discovering lurking variables that someone may not have thought |A26. Plot the data as a function of time. You may able |
|about? |to see relationships that can be linked to events that |
| |happened at a certain time, or conditions that changed |
| |at a certain time. |
|Q27. One researcher studies the murder rate of every state as a function of the average |A27. Correlations with averaged data are generally |
|literacy level of the teenagers in that state. The correlation is reported to be very |considerably higher than correlations obtained with |
|strong (and close to –1). Another researcher studies aggression in individuals in a |individuals. This is because some of the random |
|high school, as a function of their reading scores. The correlation, though in the same |variation in each variable gets reduced when you sum or|
|direction, shows much less strong a relationship. Can you explain why? |average over many individuals. (Later on in the course,|
| |you’ll find out that the variance of the mean of a set |
| |of observations for a population is equal to 1/n times |
| |the variance of the individual observations, where n is|
| |the number of observations.) The less “noisy” both |
| |variables are, the more highly they tend to correlate |
| |with each other. |
|Q28. When two variables X and Y are found to correlate with each other, of course two |A28. Common response (z causes both x and y) and |
|possible explanations for this association are 1) that X causes Y, and 2) (one not |confounding (z, which is associated with x, may cause |
|diagrammed on page 232) that Y causes X. Please name the other two possible explanations |y). |
|that are good to keep in mind when interpreting findings of associations. | |
|Q29. Suppose a researcher studies the effects of a way of teaching children not to be |A29. That they are CONFOUNDED with the intervention. |
|violent. The researcher gives the instruction to all the children in Mrs. Harmony’s |Thus the effects of these teacher variables can’t be |
|classroom, and uses the kids in Mr. Gutsly’s classroom as a comparison group. But then |distinguished from the effects of the intervention the |
|the researcher realizes that Mrs. Harmony has a very different personality and |study is meant to test. |
|interpersonal style than Mr. Gutsly: she tries to promote kindness and good will, whereas| |
|Mr. Gutsly is mainly interested in promoting competitiveness and not being wimps. What | |
|would we say about the variables of teacher personality and interpersonal style in this | |
|study? | |
|Q30. Someone finds that the degree of physical fitness in youth (as measured by heart |A30. That both fitness and ankle injuries are |
|rate recovery from exercise) is correlated with the number of ankle injuries the person |associated with more running or more athletic activity |
|has had. But before concluding that we should hurt the ankles of youth in order to make |– both are responses to this basic causal variable. |
|them more fit, a COMMON RESPONSE explanation for the association comes to mind. Can you | |
|posit this common response explanation? | |
|Q31. Even when causation is present, is there usually one and only one contributing cause|A31. No. |
|for a given effect, at least in the types of phenomena people study with statistics? | |
|Q32. Someone says, “Lots of kids play “shooter” video games for hundreds of hours, and |A32. Another way of stating this principle is that one |
|never do anything violent. Therefore these games can’t cause violence.” What does the |phenomenon does not have to be a necessary and |
|principle, as stated in your text, that “Even when direct causation is present, it is |sufficient condition for a second, in order to be |
|rarely a complete explanation of an association between two variables” have to do with |causally related. Therefore one or several instances of|
|this reasoning? |non-association do not disprove a causal relationship. |
|Q33. What is the strongest type of evidence for causal relations? |A33. Well-designed experiments that are meant to |
| |control for all lurking variables. (These usually |
| |entail randomly assigning individuals to different |
| |conditions.) |
|Q34. What’s the problem with doing a well-designed experiment, for example, to see what |A34. We will never find it ethical to randomly assign |
|the effects of child abuse are? |children to conditions of child abuse versus nonabuse. |
|Q35. Is it possible to come to valid causal inferences without doing experiments that |A35. Although your text says that “the only fully |
|randomly assign people to various conditions? Can you give an example of such? |compelling method” of establishing causality is an |
| |experiment, we can and do come to valid causal |
| |inferences without randomly assigning people to |
| |conditions. The example of smoking and lung cancer is |
| |one where the evidence for causation is “overwhelming” |
| |despite no study in which people were randomly assigned|
| |to smoke or not smoke over many years. |
|Q36. A two-way table describes the relation between two of what kind of variables? |A36. Categorical. |
|Q37. When you look at a two-way table that looks like this |A37. The party affiliation is the row variable and the |
|party affiliation Approval of president’s performance |approval of the president’s performance is the column |
| Yes |variable. |
|No Total | |
|Democrat 25 | |
|100 125 | |
|Republican 125 | |
|5 130 | |
|Total 150 | |
|105 255 | |
|What is the row variable, and what is the column variable? | |
|Q38. If we look, at the table above, at the totals for the rows, we get how many |A38. Marginal distributions. (Because they’re in the |
|Democrats and how many Republicans are in the sample. Similarly, the column totals tell |right and bottom margins of the table.) |
|us how many approvers and disapprovers are in our sample. These give us the distribution | |
|for each variable separately, in our sample. These distributions are called what? | |
|Q39. The above table gives the results in counts. Especially when the marginal |A39. To per cents (or fractions). |
|distributions are not equal (for example, if the sample should contain twice as many | |
|Republicans as Democrats) we should convert the count data to what kind of data? | |
|Q40. True or false: When describing the relationship between two quantitative variables, |A40. True. |
|the scatterplot and the correlation coefficient are usually the graph and numeric measure| |
|of choice; but in describing the relation between two categorical variables, no single | |
|graph or numeric measure summarizes the strength of the association. We usually pick and | |
|choose among bar charts and pie charts and the reporting of various per cents. | |
|Q41. Someone looks at a sample of 500 men and 100 women. 250 men oppose the war, whereas |A41. He should not just use the counts, but find the |
|80 women oppose the war. The researcher says, “Lots more men than women oppose the war. |per cents. 50% of men, but 80% of women, in this sample|
|Therefore the idea that women in this area are more pacificist is incorrect.” What’s the |opposed the war. So in this region it looks like the |
|problem with this reasoning, and what should the researcher do? |women are more anti-war than the men. |
|Q42. Suppose you have three age groups, and you have data on how many individuals got |A42. A conditional distribution. |
|educated to each of 4 different levels. Suppose you calculate, just for one of the age | |
|groups, the per cent of people in that age group who attained each level. This | |
|distribution of per cents for one age group is called what? | |
|Q43. Do the per cents for a conditional distribution add to 100 for each of the different|A43. Yes. |
|groups for which you calculate them? | |
|Q44. Do the per cents for conditional distributions equal the per cents for marginal |A44. No, not necessarily. |
|distributions? | |
|Q45. There were two AP Statistics teachers. 40% of the 40 students in the first teacher’s|A45. The second teacher, because a higher fraction of |
|classes got 5’s, and 25% of the 40 students in the second teacher’s classes got 5’s. |that teacher’s students got 5s from those both above |
|People assumed that the first teacher is better. However, someone then studied the |the cutoff and below the cutoff. |
|results based on whether or not the students scored above or below a certain cutoff on | |
|the SAT, before going into AP Statistics. The first teacher had 80% of students above | |
|this cutoff and 20% below. The second teacher had 20% above and 80% below. The first | |
|teacher had 50% of the “aboves” get 5’s, and none of the “belows.” The second teacher | |
|had 75% of the “aboves” get 5’s, and 12.5% of the “belows.” Now which teacher appears to| |
|be better, and why? | |
|Q46. The situation above is whose paradox? |A46. Simpson’s. |
|Q47. True or false: In Simpson’s paradox, there is a lurking variable, which predisposes |A47. True. |
|the results against one of the two groups; controlling for the effects of that lurking | |
|variable by looking separately at the subsets formed by the categories of it reveals | |
|results in the opposite direction from those obtained when ignoring the lurking variable.| |
|Q48. If a lurking variable can actually reverse the direction of results, do you think it|A48. Yes. |
|is also possible that a lurking variable could result in lack of an observed association | |
|when in fact there is a causal influence? | |
|Q49. Does the fact that lurking variables can obscure influences that are actually |A49. Yes. |
|present imply that: not only does correlation not imply causation, but lack of | |
|correlation does not rule out causation? | |
TPS Chapter 5: Producing Data
|Q1. The difference between an observational study and an experiment is that in the first,|A1. Deliberately imposed, manipulated. |
|the explanatory variable is observed and measured, whereas in an experiment, the | |
|explanatory variable is ____. | |
|Q2. When there is a jobs program for welfare recipients, and you simply observe that |A2. The effects of the program are confounded with the |
|those who voluntarily take part in the program do better than those who don’t, what’s the|characteristics that lead people to seek the program, |
|problem with inferring that the program causes better results? |for example motivation and values. |
|Q3. The entire group of individuals we want information about is called the _____. |A3. Population |
|Q4. The subset of the population we actually examine in order to gather information is |A4. Sample. |
|called the ______. | |
|Q5. Studying the whole population by attempting to contact every individual is called |A5. Census. |
|conducting a ______. | |
|Q6. Studying a population by taking a subset of it in order to generalize to the whole |A6. Sampling. |
|population is called _____. | |
|Q7. The method used for selecting the sample from the population is called the ____ of |A7. Design. |
|sampling. | |
|Q8. If a radio station invites anyone who wants to call and give an opinion on a |A8. Voluntary. |
|question, the set of people thus obtained is called a _____ response sample. | |
|Q9. If the researcher enrolls a group of people in the study on the basis of how easy it |A9. Convenience. |
|is to contact them and get them to enroll, that method of sampling is called ______ | |
|sampling. | |
|Q10. The systematic error introduced when the sample is very different from the |A10. Bias. |
|population is called ____. | |
|Q11. If a conservative radio commentator polls his listeners, and a liberal commentator |A11. It’s very likely that these samples differ highly |
|polls her listeners, both polls are likely to be biased as methods of ascertaining the |from the country as a whole. |
|sentiment of the country, because _______. | |
|Q12. A SRS, or simple random sample, is a subset of n individuals from a population, |A12. Every subset of n individuals has an equal chance |
|chosen in such a way that ____. |of being chosen for the sample. |
|Q13. True or false: if every individual in the population has an equal chance of being |A13. False. You need not only this condition, but also |
|included in the sample, the sample is a simple random sample. |that every subset of the population of size n is |
| |equally likely to be chosen. |
|Q14. Suppose I take the numbers 1, 2, 3, and 4, and write them on identical pieces of |A14. Yes. |
|paper, put them into a hat and mix them thoroughly, and draw out two numbers. Is this a | |
|simple random sample of the 4 numbers? | |
|Q15. Suppose I take the numbers 1, 2, 3, and 4. First I take the numbers 1 and 2 and put |A15. 0.5 |
|them into a hat, and choose one of them. Then I take the numbers 3 and 4 and put them | |
|into a hat and choose one of them. For each of the numbers 1, 2, 3, and 4, what is the | |
|probability that this number will end up in the sample? | |
|Q16. Is it possible that the subset {1,2} would be chosen for our sample using the |A16. No. |
|sampling method just mentioned (that is, pick randomly from 1 and 2, then pick randomly | |
|from 3 and 4)? | |
|Q17. So the sampling method just mentioned is one where each individual has equal |A17. Is not. |
|probability of being chosen, but each subset is not equally likely to be chosen; thus the| |
|sample obtained is, or is not, a simple random sample? | |
|Q18. In a table of random digits, each triple of digits is equally likely to be any of |A18. 1000, 000, 999 |
|the ____ possibilities from _____ to ______. | |
|Q19. The two rhyming words (with different ways of spelling the second syllable) that |A19. Label and table. |
|summarize the process of using a table of random digits to select a simple random sample | |
|are ___ and ____. | |
|Q20. There are 7 members in a class. Please describe how you would use a table of random |A20. Assign each of them a single digit label. Enter |
|digits to select a simple random sample of 3 of them. |the random number table at any point, and look at the |
| |numbers in order. If the number isn’t one of the ones |
| |you assigned, ignore it and go to the next. If it is |
| |one that you assigned, put that individual in the |
| |sample. Keep going until you have put 3 individuals in |
| |the sample. |
|Q21. A sample chosen by chance is called a ____ sample. |A21. Probability |
|Q22. Suppose there is a class, and someone wants to choose a random sample of it. But the|A22. Stratified |
|researcher wants to make sure that both males and females are adequately sampled. So the | |
|researcher takes the names of the girls, and draws a simple random sample of them, and | |
|then does the same with the boys’ names. The total sample thus obtained is not a simple | |
|random sample, but a _____ random sample. | |
|Q23. Suppose a researcher wants to collect a random sample of high school students in the|A23. Multistage sampling. |
|U.S. The researcher first takes a simple random sample of counties in the country, then | |
|takes a simple random sample of high schools within each county, and then a simple random| |
|sample of students within each high school. This sampling method is called ____. | |
|Q24. The above method of sampling high school students leaves out homeschoolers. The |A24. Undercoverage. |
|general term for such a problem in sampling is ____. | |
|Q25. When you get a survey in the mail and immediately toss it in the trash, the source |A25. Nonresponse. |
|of bias this introduces into the survey is called _____. | |
|Q26. If you were asked what is the “essential principle of statistical sampling,” would |A26. A probability sample, because the most essential |
|you say that it’s to have a simple random sample, a probability sample, a stratified |factor is that the sample be chosen by chance. |
|sample, or a multistage sample? | |
|Q27. During recent decades, society has become less and less tolerant of any sexual |A27. Response |
|activity between therapists and their clients. Surveys of the incidence of such behavior | |
|are now almost impossible to obtain, because therapists would avoid trusting a researcher| |
|with a confession of behavior that would lead to severe penalties. This introduces bias | |
|into any survey that is called ______ bias. | |
|Q28. One survey question asks, “Do you believe that children should be legally protected |A28. Wording. |
|from exposure to violent models on TV that can lead them to commit acts of violence?” And| |
|a separate question asks, “Do you believe that government should limit the free | |
|expression of ideas by censoring television?” The major difference in results these | |
|questions would yield would be referred to as ______ effects. | |
|Q29. Which would give more accurate results in a poll: a probability sample of 1000 |A29. The probability sample of 1000. |
|people, or a voluntary response sample of 100,000 people? | |
|Q30. A study in which we actually do something to people, animals, or objects in order to|A30. Experiment. |
|learn about the response is called an _____. | |
|Q31. The individuals on which an experiment is done are called the experimental whats? |A31. Units. |
|Q32. When the experimental units are human beings, according to our book they are called |A32. Subjects. |
|____, (although the preferred term among psychological researchers these days is | |
|“participants.”) | |
|Q33. The thing that is done to the subjects (or participants) (for example giving them a |A33. Treatment. |
|drug or teaching them to read) is called a ____. | |
|Q34. Suppose that in an experiment, learning of math facts is your response variable. You|A34. Factors. |
|are studying two explanatory variables, and varying them systematically in your study: | |
|amount of practice, and the frequency of recurrence of any one math fact in a practice | |
|session. These two explanatory variables are called the two _____ in the experiment. | |
|Q35. In an experiment on math facts, one is studying the frequency of recurrence of any |A35. Levels. |
|one math fact in practice sessions: does, for example, 7+8 occur every 3 problems, every | |
|20 problems, or every 90 problems? If the experiment is set up like this, within the | |
|factor called “frequency of recurrence” there are three different degrees of that factor,| |
|three different specific values of that factor, which in the jargon is called three ____ | |
|of that factor. | |
|Q36. A pill that is made of inactive material, which is used so that subjects can have |A36. Placebo. |
|information withheld about which treatment group they are in, is an example of a _____. | |
|Q37. If you want evidence for causation, and if you want to study the interactions of |A37. Experiment. |
|factors, and you are able to do either an observational study or an experiment, you | |
|should, all other things equal, choose the ______. | |
|Q38. When people get better from an inactive treatment, that is called the ____ effect. |A38. Placebo. |
|Q39. A group of individuals who receive an inactive treatment, so that the effects of a |A39. Control. |
|possibly active treatment can be contrasted with those of inactive treatment, is called a| |
|____ group. | |
|Q40. A researcher tries to make two treatment groups equal on every variable other than |A40. There are too many lurking variables – the |
|the treatment of interest. The researcher does this by fashioning two groups that are |experimenter may not measure all of them, and some of |
|very similar on several variables relevant to outcome. What is the problem with this |them may not become apparent until after the |
|method? |experiment. Some of them may not be measurable at all. |
|Q41. What’s the “gold standard” method of assuring the equivalence of two treatment |A41. To randomly assign subjects to treatment groups. |
|groups? | |
|Q42. Suppose you first assemble pairs of subjects that are very similar on the |A42. Matching. |
|preintervention measure of the response variable. Then, you randomly assign one member of| |
|each pair to the experimental group or the control group. This method combines random | |
|assignment with ______. | |
|Q43. A researcher is studying the effect of two methods of teaching reading. Instead of |A43. Yes. Understanding why this is true is central to |
|matching the subjects on their reading level and then randomly assigning one of each pair|the logic of experimental design. |
|to the two groups, the researcher ignores the initial reading level for purposes of | |
|assignment to groups, and instead picks a simple random sample of the whole set of | |
|subjects to be in each group. Is this an acceptable method of assignment to groups? | |
|Q44. True or false. When subjects are assigned at random to two groups, get two |A44. False. It is also possible that the “play of |
|treatments, and they differ on the response variable, it must be true that the treatment |chance in the random assignment” accounts for the |
|accounts for the difference between the groups. |difference in the groups. (However, the likelihood of |
| |this alternative explanation can be quantified, and |
| |when it is small enough, the other explanation is |
| |favored.) |
|Q45. Suppose that both treatments studied in an experiment in fact have no causal |A45. A very small sample size. |
|influence upon the response variable. Under what conditions are we more likely to see big| |
|differences between the two groups, due to the vagaries of random assignment: with a very| |
|small sample size, or a very big sample size? | |
|Q46. A difference between groups that is so large (and with so many subjects) that it |A46. Statistically significant. |
|would “rarely” (i.e. to whatever criterion of rarity we specify) occur by chance is | |
|called a _____ ______ effect. | |
|Q47. The three central principles of experimental design are _____ (which is making |A47. Control, random assignment, and replication. (The |
|comparisons between groups), _______ (a method of assigning individuals to groups), and |word replication here refers to repeating the |
|_______ (which has to do with how many individuals you have in your groups). |observation on more subjects within a given experiment.|
| |The word is also used, in a different sense, to refer |
| |to repeating the experiment.) |
|Q48. Suppose that we want to study the effect of a new curriculum and an old one, on |A48. Completely randomized. |
|reading skills. We also want to study the effects of whether the curriculum is delivered | |
|in person or over the phone. We randomly assign subjects to the curriculum, but we can’t | |
|randomly assign them to in person or over the phone, because certain people live too far | |
|away to get the training in person. So subjects are allocated to the curriculum at | |
|random, but not to the delivery method. We would say that the experimental design here is| |
|not _____ ________. | |
|Q49. In a “double-blind” experiment, what two sets of people are “blind” to which group |A49. The subjects themselves, and the research staff |
|the subject is in? |who have contact with them. |
|Q50. What’s a problem in making inferences from experiments that is often less of a |A50. The problem of “lack of realism,” in other words, |
|problem in observational studies? |the problem that the conditions in the study do not |
| |match those to which we wish to generalize. |
|Q51. How do you do random assignment in a matched pairs design? |A51. First choose pairs that are as similar as |
| |possible, then randomly choose one subject from each |
| |pair. |
|Q52. In a certain type of matched pair design where each subject serves as his or her own|A52. Whether the subject gets treatment 1 first or |
|control, and each “pair” consists of only one individual, what is randomly assigned? |treatment 2 first. |
|Q53. Suppose that we want to compare two method of tutoring in reading that children |A53. We randomly assign the students of the first |
|receive after school. We know that the teacher the student has is also an important |teacher to the two groups, then do the same for the |
|variable in the outcome variable, which is reading skill. Please describe how we would |students of the second teacher, and so forth, rather |
|use a block design to control the effect of teacher when studying the method of teaching |than using a simple random sample of all students. |
|reading. | |
|Q54. True or false: If we want to make separate conclusions about males and females in a |A54. True. |
|study, it’s a good idea to block on gender when making our assignment to groups. | |
|Q55. Making a model that accurately reflects the experiment under consideration and |A55. Simulation. |
|imitating chance behavior based on that model is called doing a _____. | |
|Q56. What are the 5 steps of doing simulations? |A56. State problem, state assumptions, assign digits to|
| |represent outcomes, simulate repetitions, state |
| |conclusions. |
|Q57. Someone is wanting to simulate a situation where there’s a 3/10 chance that a child |A57. Yes a problem. There are 4 digits from 0 to 3 |
|will be involved in bullying. The person assigns the digits 0 to 3 for involved in |inclusive, and 6 other digits, so the person would be |
|bullying, and the rest of the digits to noninvolved in bullying. Do you have a problem |simulating a 40% probability situation rather than a |
|with this? If so, what’s your problem? |30% probability. |
|Q58. Please use your calculator to generate 4 random integers in the range from 0 to 99. |A58. On the TI 83 or 84, you do math>prb>5:randint, (0,|
|Please tell what you entered on your calculator to get these, and what 4 integers you |99, 4). On the TI 89, you do catalog, F3, and then |
|got. |scroll down to randInt and hit enter. Then you insert |
| |0, 99, 4 in the parentheses. You’ll get different sets |
| |of numbers each time, unless something very unlikely |
| |happens! |
TPS Chapter 6 Probability
|Q1.The branch of mathematics that deals with the pattern of chance outcomes is ____. |A1. Probability |
|Q2. The big idea of the study of probability is that chance behavior is unpredictable in |A2. short run, long run |
|the _____ but has a regular and predictable pattern in the _____. | |
|Q3. An illustration of the “big idea” mentioned in Q2 is that while it is unpredictable |A3. Fraction of heads in a very large number of tosses |
|whether a single coin toss will come out heads, the ________ is almost always very close | |
|to .5. | |
|Q4. What is the difference between a changing, or variable phenomenon that is “random” |A4. A random phenomenon is uncertain with respect to |
|and one that is not? |individual outcomes, but nonetheless there is a regular|
| |distribution of outcomes in a large number of |
| |repetitions. |
|Q5. The ____ of any outcome of a random phenomenon is the proportion of times the outcome|A5. Probability. |
|would occur in a very long series of repetitions, i.e. long-term relative frequency. | |
|Q6. When there are independent trials, that means that the outcome of one trial _______. |A6. Does not influence the outcome of another. |
|Q7. The set of all possible outcomes of a random phenomenon is called the ______. |A7. Sample space. |
|Q8. An event is defined as a subset of ____. |A8. The sample space. |
|Q9. When we make a mathematical description of a random phenomenon by describing a sample|A9. Probability model. |
|space and a way of assigning probabilities to events, we are constructing a | |
|Q10. Jane has 2 shirts and 3 pairs of pants. If we want to picture the 6 ways she can |A10. Tree diagram. |
|dress in these garments, we can draw a diagram with a bifurcation point at the left of | |
|the page, with two lines going out to two points called “red shirt” and “brown shirt.” | |
|From each of these, you then draw 3 lines, saying “blue pants,” “green pants,” and “black| |
|pants.” This sort of picture is called a _____. | |
|Q11. Jane has 2 shirts and 3 pairs of pants. The “Cartesian Product” of these two sets |A11. multiplication, ab |
|produces 6 possible combinations. This illustrates what our book calls the _____ | |
|principle, which says that if you can do one task in a ways, and another in b ways, you | |
|can do both together in _____ ways. | |
|Q12. Please give an example of sampling with and without replacement. |A12. As one of many possible examples: in sampling |
| |without replacement, you draw first one, then another |
| |card from a deck without putting the first card back. |
| |In sampling with replacement, you draw one card from |
| |the deck, note its identity, replace it, shuffle them, |
| |draw again, and note the identity of the second draw. |
|Q13. The probability of any event A has to satisfy the inequality x ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
Related searches
- genesis chapter 1 questions and answers
- biology 101 chapter 1 quiz
- chapter 1 psychology test answers
- strategic management chapter 1 quiz
- psychology chapter 1 questions and answers
- cooper heron heward chapter 1 powerpoint
- chapter 1 psychology quiz
- chapter 1 what is psychology
- chapter 1 cooper heron heward
- medical terminology chapter 1 quiz
- holt physics chapter 1 test
- dod fmr volume 2a chapter 1 definitions