Walton Math Pages



Section 3.1Introduction to Linear Regression12573001143000Regularly, since 1937, the Gallup Poll has asked likely U.S. voters to answer the question “Would you vote for a qualified woman for president if your preferred political party nominated one?” Are people more likely to answer yes to this question now than they were 70 years ago? If so, has the increase been consistent, or do you think there might have been periods when the “yes’s” didn’t increase, or even decreased? Here we show the percentage saying yes to the above question in the 16 years when the survey took place.The above graph is called a scatterplot where we observe the association between two quantitative variables. We call these the explanatory and response variables. In this example, identify the Explanatory VariableResponse VariableWhen looking at a scatterplot and describing the association, there are 3 basic things to look forDirection – positive or negativeForm – straight line, curved, something exotic, or no apparent patternStrength – how much scatter is there? Guess the AssociationBelow you will find six scatterplots of hypothetical exam scores. Your task is to evaluate the direction and strength of the association between scores on the first exam and scores on the second exam for each of the hypothetical classes A - F. Do so by filling in the table below with the letter A – F. You will be ranking them from the strongest to the weakest in both the positive and negative direction. Each letter can be used only once.Most StrongModerateLeast StrongNegativePositiveA.B.C.D.E.F. Scatterplot Exercises1.Suppose you were to collect data for each pair of variables. You want to make a scatterplot. Which variable would be used as the explanatory variable and which would be used as the response variable? Why? Discuss the direction, form and strength.A)Long-distance calls: time (minutes) and costB)Lightning Strikes: distance from lightning and time delay of thunderC) A streetlight: its apparent brightness and your distance from itD) Cars: weight of car and age of owner3107690132080002.Which of the scatterplots show:Little or no association?A negative association?A linear association?A moderately strong association?3.Which of the scatterplots show:31648408699500Little or no association?A negative association?A linear association?A moderately strong association?4.A study examined brain size (measured as pixels counted in a digitized magnetic resonance image [MRI] of a cross section of the brain) and IQ (for performance scales of the Weschler IQ test) for college students. The following scatterplot is of the data. Comment on the association between brain size and IQ as seen in the scatterplot.1885958699540000200005.The fastest horse in Kentucky Derby history was Secretariat in 1973. The following scatterplot shows the speed (in mph) of the winning horses each year. What do you notice about the scatterplot? In sports, we usually expect to see an improvement in a positive direction. Is the scatterplots’ form linear? Do you believe that performance has increased at the same rate throughout the last 125 years?4064055880006.Suppose you were to collect data for each pair of variables. You want to make a scatterplot. Which variable would be used as the explanatory variable and which would be used as the response variable? Why? Discuss the direction, form and strength.A)Apples: weight in grams and weight in ouncesB)Apples: circumference in inches and weight in ouncesC) College freshmen: shoe size and GPAD) Gasoline: number of miles driven since filling up and gallons remaining in the tankUsing the graphing calculator to create ScatterplotsFor each data set below create a scatterplot using your graphing calculator. Sketch the graph with appropriate scale and labels. Then, describe the direction, form and strength of the plot. Finally, write a few sentences describing what the scatterplot tells you about the two variables.1)Fuel economy. Here are advertised horsepower ratings and expected gas mileage for several 2001 vehicles.CarHPmpgAudi A417022Buick LeSabre 20520Chevy Blazer19015Chevy Prizm12531Ford Excursion31010CMC Yukon28513Honda Civic12729Hyundai Elantra14025Lexus 30021521Lincoln LS21023Mazda MPV17018Olds Alero14023Toyota Camry19421VW Beetle11529 2)Burgers. Fast food is often considered unhealthy because much fast food is high in both fat and sodium. But are the two related? Here are the fat and sodium contents of several brands of burgers. Create a scatterplot and find the correlation between fat content and sodium content.Write a description of the association.3)Burgers. In the previous exercise you examined association between the amounts of fat and sodium in fast food hamburgers. What about fat and calories? Here are data for the same burgers.69215431800096964543180006921539497000692151949450069215194945Calories00Calories4)Drug abuse. A survey was conducted in the United States and 10 countries of Western Europe to determine the percentage of teenagers who had used marijuana and other drugs. The results are summarized in the table.5) Lunchtime. Does how long children remain at the lunch table help predict how much they eat? The table gives data on 20 toddlers observed over several months at a nursery school. "Time" is the average number of minutes a child spent at the table when lunch was served. "Calories" is the average number of calories the child consumed during lunch, calculated from careful observation of what the child ate each day.Scatterplots and Correlation CoefficientsThe?scatterplots?below show how different patterns of data produce different degrees of correlation.Maximum positive correlation(r = 1.0)Strong positive correlation(r = 0.80)Zero correlation(r = 0)Maximum negative correlation(r = -1.0)Moderate negative correlation(r = -0.43)Strong correlation & outlier(r = 0.71)Several points are evident from the scatterplots.When the?slope?of the line in the plot is negative, the correlation is negative; and vice versa.The strongest correlations (r = 1.0 and r = -1.0 ) occur when data points fall?exactly?on a straight line.The correlation becomes weaker as the data points become more scattered.If the data points fall in a random pattern, the correlation is equal to zero.Correlation is affected by?outliers. Compare the first scatterplot with the last scatterplot. The single outlier in the last plot greatly reduces the correlation (from 1.00 to 0.71).Remember strength, form and direction? We have a value that helps with this called the correlation coefficient (or r for short).The correlation r measures the strength of the linear relationship between two quantitative variables.r is always a number between -1 and 1r > 0 indicates a positive association.r < 0 indicates a negative association.The extreme values r = -1 and r = 1 occur only in the case of a perfect linear relationship.4759960-116205001369060-16383000774065-16510005412740-102870003088640-121221500316230013970000454342513970000584835013970000519112513970000184785013970000110490013970000657225139700006572256159500 -1 -0.8 -0.5 0 0.5 0.8 11802765-1097280004403090-119253000Section 3.2Track and Field Day:The table below shows data for 13 students in a statistics class. Each member of the class ran a 40-yard sprint and then did a long jump (with a running start). Make a scatterplot of the relationship between sprint time (in seconds) and long jump distance (in inches). Sprint Time (s)5.415.059.498.097.017.176.836.738.015.685.786.316.04Long Jump Distance (in) 17118448151906594787113017314314119812026352500A.Let’s discuss this graph!B.What is the correlation coefficient for this data?C.What happens to the scatterplot and the correlation coefficient if I have a 14th student who enters the class and they have a sprint time of 7.23 seconds and a long jump of 220 inches?Cricket ChirpsAt summer camp, one of Carla’s counselors told her that you can determine the air temperature based on the number of cricket chirps you hear.1. What is the explanatory variable and what is the response variable in the context of this problem?To determine the formula, Carla collected data on temperature and number of chirps per minute on 12 occasions. She entered the data into the calculator and found the summary statistics below: 2. Use the information above to find the least squares regression line (LSRL). 3. Interpret the slope and y-intercept in the context of this problem.4.One of Carla’s data points was on a particularly hot day (93o F). She counted 249 cricket chirps in one minute. What temperature would Carla’s model predict for this number of chirps? How close is this to the actual temperature that day?Ginny the GiantGinny is a giant that lives in the land of Gumdrops. She will be taking her first airplane trip soon and is very excited! However she is a bit concerned about her weight. Recently the only airline in Gumdrops had to enforce weight restrictions on their passengers because their airplanes were starting to droop from being too heavy! Therefore they sent the following list to the citizens of Gumdrops and will adhere strictly to the weight limits included. Since Ginny is the only giant in Gumdrops, there is no weight limit for her height. How can she find out the weight restriction for her height?Weight Limits for Passengers on Gumdrop ExpressHeight (ft, in)Height (in)Weight (lb)Height (ft, in)Height (in)Weight (lb)4’10”581385’8”681905’0”601486’0”722135’2”621586’2”742255’4”641696’4”762385’6”661796’6”782501. Sketch a scatterplot of the data based on what you see on the calculator. Describe the relationship between the two variables.2. From the calculator, find the equation of the LSRL and be sure to define any variables.3. Interpret the slope and y-intercept of the LSRL4. Find the correlation coefficient and interpret what it means.5. Based on what you have found so far, do you think a linear model fits this data quite well? If so, what would you predict Ginny’s maximum weight limit to be if she is 9’6” tall?6. Sketch a residual plot to see if the linear model is a good fit for this data.15049562865The following table shows the grams of fat and the number of calories for several entrees at an Italian restaurant. These data points are graphed on the scatterplot that follows the table.00The following table shows the grams of fat and the number of calories for several entrees at an Italian restaurant. These data points are graphed on the scatterplot that follows the table.5181600-238125004267200219075That's Amore Restaurant00That's Amore RestaurantYou will be entering all of this data in your calculator and finding specific information. If you feel that you need extra practice in using the formulas to find this information, this is a good data set to practice on.Describe the explanatory / response relationshipEstimate the correlation coefficient and describe what it tells you.Find the correlation coefficient from your calculator and see how close you were.Find the center of gravity. Mark it on your graph.Find the least squares regression line from your calculator. Write it here with decimals rounded to the nearest thousandth. Interpret the slope of this line in terms of the data.Draw the LSRL on your graph. What two points can you use to draw this line?Consider the Chicken Primavera. What is the observed value (y)? What is the predicted value ()? Calculate the residual for chicken primavera. What does this residual tell you?Give an example of a menu item that has a negative residual? Calculate it’s residual and explain what it means.Suppose I knew that Fettucini Alfredo has 25 fat grams. Make a prediction based on your LSRL for the number of calories in Fettucini Alfredo. If the residual for Fettucini Alfredo is – 45, what is the actual number of calories in this menu item? Exploring ScatterplotsAP StatsWalton Net books, a smaller alternative to laptop computers, are very popular because of their portability and relatively low cost. They are also popular because their batteries tend to last longer than batteries in laptops. Consumer Reports did a study of 22 net books in their February 2010 edition. Among the variables they measured were battery life (hours), weight (pounds), and cost. The data appear in the table below. One question we would like to explore in this activity: “How is battery life associated with the weight of the net book?”41624256985000Construct the scatterplot and then describe the scatterplot of battery vs. weight.List the following values:What does the correlation coefficient indicate about the relationship of battery life and weight?There are two points that lie “outside” the group of points: (3.2, 6.5) and (2.4, 5.5)Describe the points as either an outlier or influential point. Explain your reasoning.Change the coordinates of (3.2, 6.5) to (3.2, 9.7). What happened to the correlation? Be specific in your response.Now change the coordinates of your point back to (3.2, 6.5) and switch (2.4, 5.5) to (2.4, 2.5). What happened to the correlation when you changed this point’s coordinates? Be specific in your response. Did this confirm or change your initial decision as to whether the point was an influential point or an outlier?Write a statement describing the slope, in context.Write a statement describing the y-intercept, in context.414337512446000399097589789000Create a residual plot for your data:Based on the value you just calculated (r-squared) and the plot of the residuals, how well does the model fit the data?What does the coefficient of determination tell us?If s = 2.345, describe what this means in context.Summarizing Bivariate DataThe following are True/False questions to check your understanding of the concepts, facts, and terminology with regards to regression analysis. Read carefully before answering.If on average y increases as x increases, the correlation coefficient is positive.The correlation coefficient does not depend on the units of measurement of the two variables.The value of the correlation coefficient is always between 0 and 1.If r is close to 1, then the points lie close to a straight line with a positive slope.The slope of the least squares line is the average amount by which y increases as x increases by one unit.The least squares line always passes through the point .The slopes of the least squares lines for predicting y from x, and the least squares lines for predicting x from y, are equal.The higher the value of the coefficient of determination, the greater evidence for a causal relationship between x and y.9. The value of the residual plus the predicted value ?i is equal to yi.10. The coefficient of determination is equal to the positive square root of the correlation coefficient.11. Since the least squares line minimizes the sum of the squared residuals, it is possible to find this sum when given a set of bivariate data.12. There is a correlation of r = 0.54 between the position of a football player and his weight.13. The square of the slope equals the proportion of variation in the response variable that is explained by the regression line.14. A study shows that students who spend more time studying for statistics tend to achieve higher scores on their tests. In fact, the regression on number of hours studied turned out to explain 81% of the observed variation in test scores among students who participated in the study. It is safe to assume that the correlation between number of hours studied and test scores is .9.Section 3.3Regression PracticeInvestigate each data set below to determine if linear, power, or exponential will be the best model for the data. 1. Mrs. Murray’s Physics class is studying inclined planes. They roll a ball down an inclined plane and measure the total distance traveled by the ball as a function of time. They obtain the following data:Time (sec)12345678Distance (cm)311294777106149196“Now what do we do with the data?” asks a student. Mrs. Murray decides to have the AP Statistics students analyze the results and develop the best model for predicting the distance traveled by the ball after 12 seconds. Now we must decide and answer the question.2.Oil spills often occur in lakes and oceans, creating a thin layer of oil on the surface of the water in the shape of a circle (ignoring wind, current, and tides). Aerial photography makes it easy to determine the radius of the circle created by the oil spill. An environmental protection company wants to be able to estimate the volume of an oil spill from the known radius of the spill. To create a prediction model, the company conducts a study in which various amounts of oil are spilled into a controlled testing pool of water and the radius of the spill and the volume for each amount are recorded. Here are the results:A scatterplot of the data follows. Note the clear curved shape.We took the natural logarithm (base e) of the values for both variables. Some computer output from a linear regression analysis on the transformed data is shown below.Next page…Based on the output, explain why it would be reasonable to use a power model to describe the relationship between volume and radius for oil spills.Give the equation of the least-squares regression line. Be sure to define any variables you use.Suppose an oil spill created a circle with a radius of 2.9 meters. Use the model from part (b) to predict the volume of the oil.3. Wolves are among the species protected by the Endangered Species Act. Because the number of wolves in the United States was reduced and even eliminated in some areas, a program to re-introduce wolves was created to try and re-build the population. The table below shows the population of wolves living in a particular wildlife preserve.Year1989199019911992199319941995199619971998Population10121623273443557593Find the best model for this data and predict the population of wolves in the wildlife preserve for the current year.4.A small amount of detergent was dropped onto a still tank of water, and the area A of water (in square centimeters) covered by this film of detergent at various times t (in seconds) was recorded. A video camera was used to record the actual spread of the detergent, and the areas were calculated.t (sec)012345678910A (cm2)1.92.83.64.56.38.310.513.818.626.831.7A mad scientist buys 1020 milligrams of Plutonium-239 for a science experiment. Plutonium-239 is a special isotope of plutonium that can be used as fuel for nuclear reactions (needed for time travel). Plutonium is subject to natural radioactive decay, so the amount of plutonium decreases over time. The mad scientist expects that it will take him 12 years to build the science experiment, and he is concerned that he won’t have the 500 milligrams needed for the experiment upon completion of the project. He collects the following data over the next 10 years.(a) A scatterplot of the logarithm (base 10) of the amount of plutonium versus time is shown. Based on this graph, explain why it would be reasonable to use an exponential model to describe the relationship between amount of plutonium and time since purchase.(b) Here is some computer output from a linear regression analysis on the transformed data. Give the equation of the least-squares regression line. Be sure to define any variables you use.(c) Use your model from part (b) to predict the amount of plutonium 12 years after purchase.3106420170180002550795-496570Pick the Best Model00Pick the Best Model2192020138430r or r2residual plot00r or r2residual plot4115435-191135AP Statistics3.34000020000AP Statistics3.3Other Regression Models3822065569595003002280695325001496060569595002543175885825Exponential00Exponential154940885825Linear00Linear484568578740Power00Power3065780353695006737353536950052724055905500086042536512500312356534544000254127013335000391795114300005340350889000121920011620500585978011620500358902011620500520700116205Transformations:NONE00Transformations:NONE2889250116205Transformations:(x, log y)00Transformations:(x, log y)5124450116205Transformations:(log x, log y)00Transformations:(log x, log y)58597803810003550920381000110490038100047129705715y – int: When we have (0) zero of log x, we expect to have approx. ____ of log y.Rate: For every one unit increase in log x, we expect a change of ____ in log y.Always in CONTEXT!!!00y – int: When we have (0) zero of log x, we expect to have approx. ____ of log y.Rate: For every one unit increase in log x, we expect a change of ____ in log y.Always in CONTEXT!!!463553810y – int: When we have (0) zero of x, we expect to have approx. ____ of y.Slope: For every one unit increase in x, we expect a change of _____ in y.Always in CONTEXT!!!00y – int: When we have (0) zero of x, we expect to have approx. ____ of y.Slope: For every one unit increase in x, we expect a change of _____ in y.Always in CONTEXT!!!24276053810y – int: When we have (0) zero of x, we expect to have approx. ___ of log y.Rate: For every one unit increase in x, we expect a change of ____ in log y. Always in CONTEXT!!!00y – int: When we have (0) zero of x, we expect to have approx. ___ of log y.Rate: For every one unit increase in x, we expect a change of ____ in log y. Always in CONTEXT!!!4116070117475001162685190500033680401905000125730044450Find r and r2Graph the residualsWrite the equationInterpret r and r2Interpret sCalculate a predictionTalk about a causal relationship00Find r and r2Graph the residualsWrite the equationInterpret r and r2Interpret sCalculate a predictionTalk about a causal relationship ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download