Kenwood Academy



AP Statistics

Chapter 1 - Exploring Data

|Introduction - Data Analysis: Making Sense|Objectives: |

|of Data |DEFINE “Individuals” and “Variables” |

| |DISTINGUISH between “Categorical” and “Quantitative” variables |

| |DEFINE “Distribution” |

|Statistics | |

| | |

|Data Analysis |the science of data |

| | |

| |A process of describing data using graphs and numerical summaries |

|Individuals | |

| | |

| |the objects described by a set of data. Individuals may be people, animals or things |

|Variable | |

| | |

| |any characteristic of an individual. A variable can take different values for different individuals. |

| |Categorical variable – places an individual into one of several groups or categories such as hair color and marital |

| |status |

|Distribution |Quantitative variable – numerical values for which it makes sense to do averages |

| |Is zip code categorical or quantitative? |

|How to Explore Data: | |

| |Tells what values a variable takes and how often it takes these values. |

| | |

| |Examine each variable by itself. Then study relationships among the variables. |

|Example |Start with a graph or graphs. Add numerical summaries |

| | |

| | |

| |Check Your Understanding pg. 5 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Displaying Distributions with Graphs | |

| | |

| | |

| | |

| | |

|Displaying Categorical Variables: Bar and | |

|Pie Graphs | |

| | |

| |Objectives: |

|Frequency table |CONSTRUCT and INTERPRET bar graphs and pie charts |

| |RECOGNIZE “good” and “bad” graphs |

| |CONSTRUCT and INTERPRET two-way tables |

| |DESCRIBE relationships between two categorical variables |

| |ORGANIZE statistical problems |

| | |

| | |

| |The number one rule of data analysis is to MAKE a PICTURE. To decide what type of picture (visual display) is |

| |appropriate, identify the variable. Is it categorical (counts) or quantitative (measurement). |

| | |

| | |

| | |

| |Displays the count (frequency) of observations in each category or class. |

| | |

| |[pic] |

| | |

| | |

| |Bar graphs compare several quantities by |

| |comparing the heights of bars that represent |

| |those quantities |

|Bar Graph: |Bar graphs have spaces between each category |

| |of the. |

| |The order of the categories is not important |

| |Either counts or proportions may be shown on |

| |the vertical axis |

| |Make sure you include a title and appropriate |

| |labels for each axes. |

| | |

| | |

| |Must include all the categories that make up |

|Pie Graphs: |the whole |

| |Should use computer software to construct. |

| |- |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Misleading Graphs | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Τωο−Ωαψ Ταβλεσ ανδ |A two-way table of counts organizes data about two categorical variables. |

|Μαργιναλ Διστριβυτιονσ | |

| | |

| |What are the variables described by this two-way table? |

| | |

| | |

| |How many young adults were surveyed? |

| | |

| | |

| | |

| | |

| | |

| | |

|Marginal Distribution | |

| |The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values|

| |of that variable among all individuals described by the table. |

| | |

| |To examine a marginal distribution, |

| |Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. |

| |Make a graph to display the marginal distribution. |

| | |

| |See Example page 11-12 |

|Example | |

| | |

| |Check Your Understanding pg. 12 |

| | |

| | |

| | |

| | |

| | |

|Relationships Between Categorical | |

|Variables: Conditional Distributions | |

| | |

| |A Conditional Distribution of a variable describes the values of that variable among individuals who have a specific |

| |value of another variable. |

| | |

| |To examine or compare conditional distributions, |

| |Select the row(s) or column(s) of interest. |

| |Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). |

| |Make a graph to display the conditional distribution. |

| |Use a side-by-side bar graph or segmented bar graph to compare distributions. |

| | |

| |See Example page 15 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Example | |

| | |

| |Describe the conditional distribution in the chart above. |

| | |

| | |

| |Check Your Understanding pg. 17 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Organizing a Statistical Problem | |

| | |

| | |

| | |

| | |

| |How to Organize a Statistical Problem: A Four-Step Process |

| |State: What’s the question that you’re trying to answer? |

| |Plan: How will you go about answering the question? What statistical techniques does this problem call for? |

| |Do: Make graphs and carry out needed calculations. |

|Data Exploration page 19-20 |Conclude: Give your practical conclusion in the setting of the real-world problem. |

| | |

| |See Example page 18 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|1.2 Displaying Quantitative Data with | |

|Graphs | |

| | |

| | |

| | |

| |Objectives: |

|Dotplots |CONSTRUCT and INTERPRET dotplots, stemplots, and histograms |

| |DESCRIBE the shape of a distribution |

| |COMPARE distributions |

| |USE histograms wisely |

| | |

| |How to Construct a Dotplot: |

| |Draw a horizontal axis (a number line) and label it with the variable name. |

| |Scale the axis from the minimum to the maximum value. |

| |Mark a dot above the location on the horizontal axis corresponding to each data value. |

| |Useful for small data sets |

| |[pic] |

| | |

|Don’t Forget Your SOCS! | |

| |How to Examine the Distribution of a Quantitative Variable: |

| |Shape |

| |Outliers |

| |Center |

| |Spread |

|Describing Shape | |

| | |

| | |

| |In general, when looking at graphs: Look for an overall pattern and also for striking deviations from that pattern |

| |To give the overall pattern of a distribution: |

| |Give the center and spread |

|Symmetric |See if the distribution has a simple shape that you can describe in a few words. (skewed to the right, skewed to the |

| |left, symmetric) |

| | |

|Skewness |– a distribution is symmetric if the right and left sides are approximately mirror images of each other. |

| | |

| |- A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the |

| |observations with larger values) is much longer than the left side. |

| |- It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. |

| | |

| |Symmetric Skewed - left Skewed - right |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Outliers | |

| | |

| | |

| | |

|Unimodal |– an outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph. |

| |Once you have spotted outliers look for an explanation. |

|Bimodal | |

| | |

|Multimodal | |

| |Distribution with a single peak |

| | |

|Example |Distribution with two clear peaks |

| | |

| |Distribution with more than two peaks |

| | |

| | |

| | |

| |Check Your Understanding page 31 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Comparing Distributions | |

| | |

| | |

| | |

|Stemplots | |

| |See Example and AP Exam Tip page 32 |

| | |

| | |

| | |

| |How to construct a Stemplot: |

| |Separate each observation into a stem (all but the final digit) and a leaf (the final digit). |

| |Write all possible stems from the smallest to the largest in a vertical column and draw a vertical line to the right of |

| |the column. |

| |Write each leaf in the row to the right of its stem. |

| |Arrange the leaves in increasing order out from the stem. |

| |Provide a key that explains in context what the stems and leaves represent. |

| | |

| |Useful for small to medium data sets |

| |Individual data values are preserved |

| |When you have few stems it may be helpful to split the stems to get a better idea of the shape of the graph |

| | |

| |[pic] |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Splitting Stems | |

| | |

| |When data values are “bunched up”, we can get a better picture of the distribution by splitting stems. |

| |[pic] |

| |Two distributions of the same quantitative variable can be compared using a back-to-back stemplot with common stem |

| | |

| |[pic] |

| | |

| | |

| | |

| |Check Your Understanding page 34-35 |

| | |

| | |

|Back-to-Back Stemplots | |

| |Divide the range of data into classes of equal width. |

| |Find the count (frequency) or percent (relative frequency) of individuals in each class. |

| |Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should |

| |touch, unless a class contains no individuals. |

| | |

| | |

| | |

| |See Example page 35-36 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Example | |

| | |

| | |

| | |

|Histograms | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Histograms on the Calculator | |

| |Check Your Understanding page 39 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| |Check Your Understanding page 41 |

| | |

| | |

| | |

| | |

|Example | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| |Objectives: |

| |MEASURE center with the mean and median |

| |MEASURE spread with standard deviation and interquartile range |

| |IDENTIFY outliers |

| |CONSTRUCT a boxplot using the five-number summary |

| |CALCULATE numerical summaries with technology |

| | |

| | |

| |To find the mean of a set of observations, add their values and divide by the number of observations. If the n |

| |observations are x1, x2, x3, …, xn, their mean is: |

| |[pic] |

| |[pic] ([pic] is pronounced “x-bar”) |

| | |

| | |

| |The mean is a good way to measure the center when the shape of your distribution is unimodal and symmetric. Because the|

| |mean cannot resist the influence of extreme observations, like outliers, it is not a resistant measure. So use caution |

| |if such values are present or when your distribution is skewed. |

| | |

|1.3 Describing Quantitative Data with | |

|Numbers |The median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other |

| |half are larger. |

| |To find the median of a distribution: |

| |Arrange all observations from smallest to largest. |

|Measuring Center: The Mean ([pic]) |If the number of observations n is odd, the median M is the center observation in the ordered list. |

| |If the number of observations n is even, the median M is the average of the two center observations in the ordered list.|

| | |

|The Mean ([pic]) |The median is resistant to outliers. The median is a better measure of the center when outliers are present or when |

| |your distribution is skewed. |

| | |

| | |

| | |

| |The mean and median of a roughly symmetric distribution are close together. |

| |If the distribution is exactly symmetric, the mean and median are exactly the same. |

| |In a skewed distribution, the mean is usually farther out in the long tail than is the median. |

| | |

| |Check Your Understanding page 55 |

| | |

|Measuring Center: The Median (M) | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Comparing the Mean and Median | |

| | |

| | |

|Example | |

| | |

| | |

| |Range = Maximum value – Minimum Value |

| |The range shows the full spread of the data. But it depends only on the smallest and largest values, which could be |

| |outliers. We can improve our description of spread by also looking at the spread of the middle observations. |

| | |

| | |

| |The quartiles mark of the middle half of the data. To calculate the quartiles |

| |Arrange the observations in increasing order and locate the median. |

| |[pic], the 25th percentile, is the value such that 25% of the data values are less than it. To find [pic], find the |

| |median of the first half of the data. |

| |[pic], the 75th percentile, is the value such that 75% of the data values are less than it. To find [pic], find the |

| |median of the second half of the data. |

|Measuring Spread: The Interquartile Range |Note: If you have an even number of observations, just take the average of the middle two numbers. |

|(IQR) | |

| |The interquartile range (IQR) covers the range of the middle 50% of data. Because it doesn’t use extreme values, it is |

|The Range |resistant to outliers. Use the IQR as your measure of spread when outliers are present or if data are skewed. When |

| |using the median as your measure of center, use the IQR as your measure of spread. |

| |IQR = [pic]-[pic] |

| | |

| |See Example page 57 |

| |[pic] |

|The Quartiles ([pic]) |IQR = Q3 – Q1 |

| |= 42.5 – 15 |

| |= 27.5 minutes |

| | |

| |Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes. |

| | |

| | |

| |The 1.5 x IQR Rule for Outliers |

| |You can use the IQR to find outliers. Call an observation an outlier if it falls more than 1.5 ( IQR above [pic] or |

|interquartile range (IQR) |below [pic]. |

| | |

| |In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and IQR=27.5 minutes. |

| |For these data, 1.5 x IQR = 1.5(27.5) = 41.25 |

| |Q1 - 1.5 x IQR = 15 – 41.25 = -26.25 |

| |Q3+ 1.5 x IQR = 42.5 + 41.25 = 83.75 |

| |Any travel time shorter than -26.25 minutes or longer than 83.75 minutes is considered an outlier. |

| | |

| | |

| |Minimum [pic] M [pic] Maximum |

| | |

| |Regular boxplots conceal outliers so we will use the modified boxplots because it plots outliers. |

| | |

| | |

| |Check Your Understanding page 55 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Identifying Outliers | |

| | |

| | |

| | |

| | |

|Example | |

| | |

| | |

| | |

| |See Technology Corner page 61 |

| | |

| | |

| |The standard deviation sx measures the average distance of the observations from their mean. It is calculated by finding|

|Five Number Summary |an average of the squared distances and then taking the square root. This average squared distance is called the |

| |variance. |

| |Consider the following data on the number of pets owned by a group of 9 children. |

| | |

| |See Example page 62 |

| | |

|Example |1) Calculate the mean. |

| |2) Calculate each deviation. |

| |deviation = observation – mean |

| | |

| | |

| | |

| | |

| | |

| | |

| |3) Square each deviation. |

| |4) Find the “average” squared deviation. |

| |Calculate the sum of the squared deviations |

| |divided by (n-1)…this is called the |

| |variance. |

| |5) Calculate the square root of the variance… |

| |this is the standard deviation. |

| | |

| |“average” squared deviation = 52/(9-1) = 6.5 |

|Construct Calculator Boxplots |This is the variance. |

| | |

|Measuring Spread: The Standard Deviation |Standard deviation = square root of variance |

|([pic]) |= [pic] |

| | |

| |[pic] |

| | |

| |The variance (s²) of a set of observations is the average of the squares of the deviations of the observations from |

| |their mean. Because its formula contains the mean, the s.d. is not resistant to outliers. When using the mean as your |

| |measure of the center, you should use the s.d. as your measure of spread. |

| | |

| |[pic] |

| | |

| | |

| |See Technology Corner page 65 |

| | |

| | |

| | |

| |The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a |

| |distribution with outliers. |

| |Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. |

| |NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA! |

| | |

| | |

| |See Example page 66-67 |

| | |

| | |

| | |

|Standard Deviation ([pic]) | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Variance ([pic]) | |

| | |

| | |

| | |

|Computing Numerical Summaries on | |

|Calculator | |

| | |

| | |

|Choosing Measures of Center and Spread | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Summary |

-----------------------

DiceRolls

0

2

4

6

8

10

12

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download