Introduction to Scatterplots and Association



Lesson: Introduction to Scatterplots and Bivariate Relationships

Learning Goals and Concept Flow

Statway Goal S.2 Demonstrate the use of distributional thinking to reason about data in order to describe and summarize distributions of data, identify trends and patterns …

The series of tasks in this introductory lesson are designed to motivate an initial and informal understanding of concepts related to interpreting scatter plots.

Specifically, we want students to gain a preliminary and informal understanding that:

• Each point on the scatter plot represents a single observation consisting of measurements on two variables.

• Patterns in the data can help us estimate a reasonable value for y given a value for x.

• The overall pattern in a scatter plot can be described in terms of the direction, form and strength of the relationship between the two measurements. (It is also important to note deviations from the pattern, though outliers are not discussed in this lesson.)

• Upward and downward trends can be summarized by a line; the direction is described as a positive or negative linear relationship.

• Strength of the relationship is a statement about variability (scatter) and the degree to which x explains y (the latter idea is developed in future lessons). Variability can be explained by increases in x or by additional explanatory factors having influence on y.

Students should begin to learn how to:

• Interpret the meaning of particular points on the plot;

• Interpret the scatter plot by reference to the actual objects and variables measured;

• Relate patterns in data to direction and linear form;

• Identify whether the relationship is positive or negative;

• Assess the strength of the relationship informally by looking at the degree of scatter.

• Make predictions that fit the overall pattern in the data.

In this series of tasks we are not formally developing the statistical concepts of correlation or regression. Rather we are working to build students ability to see associative trends through the noise of real data and to make decisions in the face of variability. After this lesson, students will be introduced to the concept of a regression line as a way of describing central tendency of a bivariate distribution (much like the mean describes the central tendency of a univariate distribution.) They will see that the slope of the regression line represents the direction of a bivariate relationship. They will also see that the variability around the regression line is analogous to variability around the mean of a univariate distribution. This variability can be measured by a correlation coefficient and understood when data is seen in relation to mean lines, much as the standard deviation measures variability relative to the mean in univariate data.

Developmental Math Connections

Since this is an introduction to working with bivariate data, the learning goals are focused on interpreting scatter plots and motivating an initial and informal understanding of concepts related to modeling data with a linear function. In future lessons these ideas will become more explicit.

In this lesson developmental math concepts are thoroughly integrated into the conceptual subtasks and not taught as separate skills. (This will not always be the case.) For example, the scaffolding provided in the first subtask is designed especially for developmental students to facilitate the interpreting of scatter plots. These types of questions might be atypical in a traditional statistics course, where we might assume that students are able to read and interpret scatter plots without this kind of scaffolding.

You might wonder why we did not begin with a discussion of the Cartesian plane or an exercise in which students construct a scatter plot, as we might in an elementary algebra course. Our rationale is multifold. First, in Statway we have a more heterogeneous group of students since students typically found in community college elementary algebra and intermediate algebra are together in the same class. So we need to find ways to level the playing field so that some students are not bored and others lost. Of course, this level playing field means that all students must be able to bring something to the task that puts them into play, so to speak. Second, we build some of these skills directly into the lesson. For example, students have to plot points in Subtask 2, but the plotting of points is connected to making predictions that fit the trend in the data. So point plotting is in the service of the learning goal of interpreting trends in bivariate data. Third, research suggests that students need to struggle with meaningful tasks before a careful explication of concepts will help them construct deeply held understandings. The lesson is constructed to provide an opportunity for productive struggle, followed by carefully constructed subtasks that we think will be sufficient for all students to begin to make sense of the concepts at hand.

If you anticipate that some students, due to deficits in developmental math skills, may have trouble at specific points in the lesson, be vigilant about observing students work to see if your expectations are founded. Remember our hypothesis that productive struggle is a key ingredient in facilitating conceptual learning. Of course you will need to use your own judgment about when to intercede to work with students individually. Be sure to report your observations back to the Lesson Study group. This will be invaluable information for improving the lesson.

Lesson Structure:

This lesson has the following components:

Introduction to the context of the lesson (5-10 minutes)

Part I: Students work on a rich task (10-15 minutes)

Part II: Scaffolded Conceptual Subtasks (via Discussion/Group Work/Lecture) (25-35 minutes)

Part III: Culminating Tasks for Homework (done outside of class) (5 minutes)

This lesson is designed to help students begin to understand concepts related to interpreting scatterplots. The lesson structure reflects what we have gleaned from current research in the teaching and learning of mathematics and the GAISE document. Part of our research agenda is the study of whether the conceptual subtasks included in each lesson have the cumulative effect of producing deeply held quantitative reasoning skills. For this reason, we ask that instructors teach the lesson as it is, so that we can gather authentic information about the lesson in its current form. During the lesson study process, we will gather your feedback on the lesson, the sequencing of concepts, and students’ work. All of which will inform lesson development and improvement.

Introduction to the context of the lesson:

Show a box of a children’s cereal. Ask students to rate this cereal on a scale of 0-100, with 0=unhealthy and 100=very nutritious and good for you. Call on a few to get a sense of rating variability. Repeat with a healthy cereal. Then ask students to describe the ingredients they are using to rate cereals. “Are you using one ingredient or more than one to determine your rating?” Make associative statements like “so in your rating system, large amounts of sugar relate to lower ratings.”

Give a brief description of Consumer Reports as a non-profit organization that provides independent ratings to help consumers make informed choices. Show a copy of Consumer Reports. Hand-out cereal graphs and the first task. Explain that Consumer Reports rated 77 cereals. We have gathered ingredient information about these 77 cereals from cereal boxes and created scatterplots relating each ingredient to the Consumer Report rating.

PART I: Students work on a rich task

Purpose of Part I: Provide an opportunity for students to struggle with important statistical ideas. Struggle is an important component of building conceptual understanding. Working with a rich task requires expending effort to figure something out that is not readily apparent and thus builds persistence and the ability to reason in situations requiring non-algorithmic thinking. It is the opposite of memorizing presented information or practicing what has been demonstrated. Research also suggests that struggle with a well-designed task prepares students to learn more effectively from subsequent direct instruction, such as lecture.

Introduce the task: Use the introductory material that is part of the student hand-out (also given below.)

Students work on the task alone for 3 minutes, then in small groups for 5-10 minutes (depending on your sense of whether productive conversations are occurring.) During this time, listen to how students are reasoning as they discuss the task. Collect the following information for the Lesson Study: (1) Tally how many students are making decisions based solely on their prior beliefs without using the data. (2) Make a few notes about how students are initially reasoning with the data.

Task (as on student hand-out): At the end of this lesson you will be designing a children’s cereal. You want your cereal to receive above average Consumer Report ratings but also taste good. Of course, we all know that what tastes good may not be the most nutritious. So this will be a balancing act.

Since you want your new cereal to be rated above average by Consumer Reports, you need to get a sense of how Consumer Reports determines their ratings. Consumer Reports has not made their rating formula public. So you will analyze some scatterplots to figure out what ingredients influence their ratings. Each scatterplot relates one ingredient to the Consumer Report rating for 77 breakfast cereals. We just chose ingredients that we thought might be used in the Consumer Reports formula. We actually don’t know for sure which ones are used to determine the rating.

Your job is to answer these questions AND write down enough of your reasoning that someone can follow your thinking:

1) Pick a scatterplot with a pattern that fits your expectations and a scatterplot with a pattern that is surprising to you. Explain why the patterns you see are expected or surprising.

2) Pick one ingredient that you think is influential in determining the Consumer Report rating. Now pick an ingredient that you think is not influential.

3) How do the patterns in the data support your choices in (2)?

Closure for Part I and transition to Part II:

Part I functions to provide students the opportunity to struggle with important ideas (like interpreting scatterplots and seeing patterns that relate to a question at hand). So, at this point in the lesson, you do not need to guide students to discover canonical ideas, such as correlation, or even correct their misconceptions or fix their errors. The next segment of the lesson, Part II, is designed with more explicit attention to developing the skills and understandings described in the learning outcomes. It is in Part II that we will make connections, as well correct errors and address misconceptions as appropriate.

To validate student effort, bring closure to Part I, and transition into the next part of the lesson in a timely fashion, use one of these options:

1) Tell students the purpose for Part I and move on to Part II.

You might say something like, “Research suggests that you build deeper understandings if you try make meaning of something or figure something out before you hear about big concepts. So now that you have done some initial brainstorming and gotten familiar with the scatterplots, you are better prepared to benefit from the next part of the lesson where we will work in smaller steps to interpret scatterplots, see patterns, and make decisions based on these patterns.”

In this option you do not need to facilitate any type of class discussion. Just move into Part II of the lesson.

2) Listen, reiterate, and create openings for further investigation, then move on to Part II.

Call on a student, listen carefully and respectfully to his/her thinking, reiterate the key points, and highlight ways in which the student’s observations create openings for further investigation. (see examples below) Repeat with a few students, then move into Part II.

• “It is interesting how you focused on the vertical stacks of data in the protein graph. This makes me wonder why some graphs have these stacks and others don’t. Let’s ponder this as we move into the next part of the lesson.”

• “I hear you saying that sugar is bad for us. So in your rating system lots of sugar would lower the rating. This makes me want to see if a similar pattern is true for the Consumer Report rating. Looking for patterns is the focus on the next part of the lesson, so we can investigate this idea in a minute.

• “I hear you using the idea of slope in describing patterns you see in this data. This makes me think that you are somehow using lines to describe patterns in the data. It is a powerful problem-solving move to search for and use ideas from your previous math experiences when you are confronted with a new situation. Let’s keep your idea in mind as we move into the next part of the lesson.”

In this option you are not required to correct student errors or facilitate other students’ understanding of the observations. The purpose is to demonstrate listening skills and highlight ways to move an investigation forward by noticing and pondering. Look for opportunities to address these “openings” in Part II after students have worked with the conceptual subtasks. At that point, more students will be prepared to learn from the explanations.

3) Facilitate a class discussion that builds on student reasoning to accomplish the learning goals. If you choose this option, you can use the questions in the Conceptual Subtasks to scaffold the discussion.

Part II: Scaffolded Conceptual Subtasks

Purpose of Part II: Part II breaks down the learning goals into two conceptual subtasks. The purpose of Part II is to provide a coherent and structured path that fosters conceptual understanding of the ideas articulated in the learning goals.

Research supports the rather obvious claim that teaching which attends explicitly to concepts produces conceptual understanding; however, research is inconclusive about the impact of different forms of instruction on producing conceptual understanding. For this reason, the

conceptual subtasks can be addressed using one or a combination of three approaches: whole class discussion, small group work followed by whole class discussion, or lecture. In discussion you can draw on the subtasks for conceptual questions that build on student reasoning and move the entire class toward learning goals. If you decide to use scaffolded tasks as group work, the subtasks can be used verbatim as group work activities focused on smaller conceptual bites. You can also use the subtasks to guide your lecture. These approaches are discussed in more length at the end of this document.

By the end of Part II, students should be prepared to do productive work on the culminating tasks that comprise the homework in Part III. Successful completion of the homework tasks will demonstrate that a student has achieved the learning goals for this lesson.

Conceptual SubTasks for this Lesson

Conceptual SubTask 1: Reading and Interpreting Scatterplots

Goals:

• Interpret the meaning of particular points in a scatterplot;

• Interpret the scatter plot by reference to the actual observations and variables measured;

• Understand that each point on the scatter plot represents a single observation consisting of measurements on two variables.

• Understand that variability is due in part to other explanatory factors influencing the response (ratings).

Introduction: Tell students that we are now going to narrow our focus and work on interpreting scatterplots. Give instructions appropriate to the facilitation strategy that you have chosen.

Scaffolded Questions About the Data Set:

1) Captain Crunch has the lowest Consumer Report rating of the 77 cereals in the data set. How much fat is in a serving of Captain Crunch?

2) In this set of 77 cereals, Product 19 has the most sodium in a serving. What is the rating for Product 19?

3) All-Bran Extra Fiber is the cereal with the highest rating. How much sugar, fat, sodium and fiber is in a serving of All-Bran Extra Fiber?

“Wrap up” Questions/Direct Instruction about Statistical Concepts:

If students have been working in groups, you will have some sense of where they had difficulty. For the subsequent whole class discussion, address areas of difficulty and answer questions in the context of the “wrap-up” questions. (You will not have time to go over the answers to the Scaffolded Questions.)

To bring closure, ask students to discuss the following question or deliver a short mini-lecture that answers this question:

• When a statistician reads a scatter plot, she will ask herself two questions: (1) Who or what is described by the data, i.e. what does a dot represent? And (2) what measurements were made, i.e. what are the variables? Pick a scatter plot and answer these two questions.

Conceptual SubTask 2: Seeing Patterns and Relationships in Scatterplots

Goals: In Conceptual Subtask 2 the goal is for students to begin to develop an understanding that

• Patterns in the data can help us estimate a reasonable value for y given a value for x;

• Upward and downward trends can be summarized by a line;

• How well the line describes the pattern in the data depends on the amount of variability in the data;

• Variability can be explained by increases in x or by additional explanatory factors having influence on y.

Students should begin to learn how to:

• Relate patterns in data to direction and linear form;

• Identify whether the relationship is positive or negative;

• Assess the strength of the relationship informally by looking at the degree of scatter.

• Make predictions that fit the overall pattern in the data.

Introduction: Now we will continue our detective work with Consumer Report ratings. We want to try to pick an ingredient that influences the ratings and one that isn’t. To do this we need to understand more about patterns in the data and how the data is scattered.

Scaffolded Questions About the Data Set:

1) Compare and contrast the sugar-ratings scatter plot with the protein-ratings scatter plot.

• How are the patterns in these two scatter plots similar? How are they different? Are the patterns you see what you expected? Why or why not?

2) Think about variability

• There are 3 cereals that have 2 grams of sugar in a serving. Find these 3 cereals in a scatter plot. Do these cereals have the same rating? If the ratings differ, what might explain the variability in the ratings?

3) Make some predictions.

• A new cereal for children has 18 grams of sugar per serving. Approximately what rating do you think the cereal will receive?

• Use your prediction to plot this new cereal on the sugar-rating scatterplot.

• Several popular diets advocate high protein intake. Three new cereals are being developed for this market. All will have 5 grams of protein per serving. What do you think is a reasonable range for the ratings for these three cereals? Explain how you determined a reasonable range.

• Use your predictions to plot these three cereals on the scatter plot.

• In which situation (high sugar cereals or high protein cereals) do you feel the most confident about your predicted ratings? How is this confidence (or lack of confidence) related to what you see in the scatter plots?

“Wrap up” Questions/Direct Instruction about Statistical Concepts:

Mini-lecture as an introduction to the following concepts: form (linear), direction (positive or negative), strength of relationship (amount of variability, how much x explains y).

Form and Direction:

(1) We can see some trends in the data. For example, cereals with larger amounts of sugar have lower ratings. Statisticians use lines to summarize the overall pattern and direction of relationship between the two measurements in scatter plot. Show regression lines on scatter plots. The line highlights the direction of the trend and gives us a way to describe a relationship between the measurements.

(2) Discuss positive and negative relationship informally. Positive relationship suggests that larger values for x tend to correspond to larger values for y; trend lines have a positive slope. A negative relationship suggests that larger values of x tend to correspond to smaller values for y; trend lines have a negative slope.

• Which ingredients are positively associated with ratings? Negatively associated with ratings? Does this positive or negative association make sense given your own understanding about the nutritional value of the ingredient?

• Analyze the vitamin-ratings relationship. Why is this relationship surprising? What could explain the negative relationship? (Perhaps Consumer Reports is not using vitamins in their formula.) Why? (There is very little variability in the vitamin content among the cereals, so it is not a useful way to distinguish cereals.)

We will learn more about these trend lines in future lessons.

Strength:

Of course, if we are using lines to summarize trends, we need to ask how well the line really describes the relationship between the nutrient and ratings. Statisticians look at how much the data is scattered about the line to determine the strength of the relationship between the variables. In a future lesson we will develop a way to measure the strength of the relationship. For now we will eyeball it.

• Which ingredient appears to have the strongest relationship with ratings?

• Many of the ingredients have weaker relationships with ratings. Sodium and fat are the weakest. It is hard to eyeball the strength from these examples. We need a better way to measure the strength of a relationship. We will work on this in a future lesson.

If there is a lot of variability about the line, the ingredient alone is not influential in determining the ratings.

Thinking more about variability: Statisticians always look at the variability in the data and try to explain it. Some of the variability in ratings can be explained by changing the amount of an ingredient. For example, we can see that cereals with larger amounts of sugar will have lower ratings. The line is one way to describe this kind of variability in ratings: we change the amount of an ingredient and the line summarizes how the ratings change. But there is also variability that is not explained by the amount of ingredient. For example, we can see many cereals with the amount of protein but different ratings. What might explain the variability in ratings for cereals with the same amount of a given ingredient?

Part III: Culminating activities for homework

(1) Summarize what you feel you learned today.

(2) You are developing a new cereal for children. You want your cereal to rate above average on the Consumer Rating scale so that the cereal appeals to parents, but you also want the cereal to taste good to children. The average rating for these 77 cereals is about 43. From marketing research we know that in blind taste tests, the children prefer cereals that are not good for them. They like cereals high in sugar, low in protein and fiber, with lots of salt.

a. Which two ingredients do you think are influential in determining the Consumer Report ratings? Which two ingredients do not seem to be influential? Use the data to support your choices. Write a short explanation of how you used the data.

b. For the two ingredients you chose as most important, how much of each will you use in your cereal recipe? Describe how you thought through this decision. Remember that you want to keep your rating above average.

Plus other activities pending our upcoming Monday discussion of homework.

In addition to the “design a children’s cereal” task above, I suggest online activities similar to the following:

(1) Given a variety of scatterplots and associated scenarios, identify what a dot represents and answer questions similar to Subtask 1.

(2) Given scatterplots, identify those with linear form, those with strong and weak linear relationships.

(3) Given scatterplots and specified x-values, identify reasonable predictions for y-values.

Instructional Approaches for use in Part II

Approach I: Discussion

The discussion approach attempts to achieve the learning goals via instructor-led discussion about the problem, questions, and data set. The discussion follows the same conceptual flow as the sequential subtasks; however, rather than digress from the larger problem and questions to work on subtasks and discuss subtask questions, the instructor essentially covers the same conceptual and practical ground through extended discussion of students’ work on the larger problem and questions of Part 1. For the students then, the discussion approach in Part 2 would feel like a direct follow-up to their work on the questions of Part 1.

Suggestions for conducting discussion:

Prepare for the discussion by monitoring students’ responses to the rich task with the goal of structuring the discussion. Look and listen for student reasoning that you want to bring forward. Also listen for student difficulty relative to learning goals. The questions outlined in the subtasks will be useful in moving students who are struggling into a productive zone of work. Use these questions during the discussion at appropriate times to help students understand each other’s ideas, highlight and name concepts, and make connections.

Using this type of Socratic method requires that you (1) build in think time, (2) elicit student reasoning, not just answers, (3) respond to students in ways that focus on the learning goals for this lesson, (4) use strategies to bring many students into the conversation, (5) informally track understanding across the class, and, most importantly, (6) make statistical concepts explicit.

Approach II: Sequential Conceptual Subtasks Completed as Group Work

The sequential subtasks approach attempts to achieve the learning goals via paired or small group work and instructor-led discussion. Discussion is focused on carefully selected and ordered subtasks related to the larger problem and dataset, and carefully selected and ordered questions within each subtask. Each subtask concludes with wrap-up questions about the statistics to be discussed with the whole class in order to explicitly articulate statistical concepts most clearly illuminated by the subtask.

Approach III: Lecture

The lecture (or demonstration) approach attempts to achieve the learning goals via instructor explanation (and perhaps demonstration) of the Part 2 subtasks. The instructor explains and shows students how one might go about answering the subtask questions using the available data. Throughout the course of this explanation (demonstration) or at the end of each subtask, the instructor then explains to the students the statistical concepts related to the learning goals illuminated by the subtask.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download