Wednesday, August 11 (131 minutes)



AP STATISTICS: Chapter 1 Exploring DataName ___________________________Intro:Making Sense of DataDate _____________ Period ________What is statistics?What is data analysis?Definition:Individuals-Variable-Ex 1: Identify the individuals and variables for a high school’s student data base.2 questions to ask when you 1st meet a new set of data:1.2.Definition:Categorical Variable-Quantitative variable- Ex 2: Identify categorical and quantitative variables for a high school’s student data base.Do we ever use numbers to describe the values of a categorical variable? Give some examples.EX 3: Who are the individuals in this data set?What variables were used? Identify as categorical or quantitative.Describe the individual in the highlighted row.Rows vs Columns:What is a distribution?How to explore data:1.1 Analyzing Categorical DataPie Chart: Use this type of graph when you want to emphasize each category’s relation to the whole (displays categorical data).Bar Graph: These graphs display the distribution of a categorical variable. Use this type of graph when you want to compare parts of a whole.Two-Way Table: a table that describes two categorical variables in counts or percents. What is the difference between a frequency table and a relative frequency table? Ex 1: What Personal Media Do You Own?Here are the percent of 15-18 year olds that own the following personal media devices, according to the Kaiser Family Foundation:DevicePercent who OwnCell Phone85%MP3 Player83%Handheld Video Game Player41%Laptop38%Portable CD/Tape Player20%Make a well labeled bar graph to display the data. Describe what you see.Would it be appropriate to make a pie chart for this data? Why or why not? What are some common ways to make a misleading graph? (pp. 11&12)What is wrong with the following graph?419107112000The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.We say that there is an association between two variables if specific values of one variable tend to occur in common with specific values of the other.------------------------------------------------------------------------------------------------------------------------------------------------------------Ex 2: Cell phonesThe Pew Research Center asked a random sample of 2024 adult cell phone owners from the United States which type of cell phone they own: iPhone, Android, or other (including non-smart phones). Here are the results, broken down by age category. the cell phone data to calculate the marginal distribution (in percents) of type of cell phone.Make a graph to display the marginal distribution. Describe what you see.What percent of each age group is an iPhone user (conditional distribution)?Make a side-by-side bar graph of cell phone type by age group1.2 Displaying Quantitative Data with GraphsStemplot: Displays distributions of quantitative data. These graphs work best for small numbers of observations that are all greater than 0. This will give you a quick picture of the shape of a distribution while including the actual numerical values in the graphs. When you wish to compare two related distributions, a back-to-back stemplot with common stems is useful.Histograms: This graph shows the distribution of counts or percents among the values of a single quantitative variable.Classes (or bins or bars) should be of equal width. A good rule of thumb is to use a minimum of five classes.Be sure to pay attention to whether you are reading/creating a frequency histogram (number in a group) or a percent histogram.Dotplot: One of the simplest ways to graphically represent quantitative data.How to Examine the Distribution of a Quantitative VariableEx1: Smart Phone Battery Life Here is the estimated battery life for each of 9 different smart phones (in minutes). Make a dotplot of the data and describe what you see.Smart PhoneBattery Life (minutes)Apple iPhone300Motorola Droid385Palm Pre300Blackberry Bold360Blackberry Storm330Motorola Cliq360Samsung Moment330Blackberry Tour300HTC Droid460Ex 2: Time Spent on Internet: Graph the data using a histogramTime on theInternet(min)Frequency0710120330740145160159031201418010210124010270230093603Definition: Symmetric and Skewed Distributions (describing shape)Illustrate the following distribution shapes: SymmetricSkewed rightSkewed leftUnimodalBimodalUniformWhat is the most important thing to remember when you are asked to compare two distributions?Ex 3: Energy Cost: Top vs. Bottom FreezersHow do the annual energy costs (in dollars) compare for refrigerators with top freezers and refrigerators with bottom freezers? The data below is from the May 2010 issue of Consumer Reports. 1.3 Describing Quantitative Data with NumbersCENTER:Mean or Median38290506350000Mean: average value of a population ( μ, "mu" ) or sample ( x , “x-bar”); BE CAREFUL, the mean is sensitive to the influence of a few extreme observations! We say that the mean is not a resistant measure of center and use it to describe the center of symmetric distributions.Median: middle value of a data set when the data is in order from least to greatest. It is the value such that 50% of the data falls below and 50% falls above (which makes it the 50th percentile). The median is more resistant to extreme observations than the mean so it is used to describe the center of skewed distributions. Sometimes, the median is referred to as paring the Mean and Median: The mean and the median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median will be exactly the same. In a skewed distribution, the mean is farther out in the long tail than the median.SPREAD: Interquartile RangeRange is the difference between the maximum observation and the minimum observation (range = max - min). The first quartile (Q1) is the value in a data set such that 25% of the data falls below it (25th percentile).The third quartile (Q3) is the value in a data set such that 75% of the data falls below it (75th percentile).To calculate the quartiles:1. Arrange the observations in increasing order and locate the median M.2. The first quartile Q1 is the median of the first half of the observations (the observations to the left of M).3. The third quartile Q3 is the median of the last half of the observations (the observations to the right of M).The interquartile range is the distance between the quartiles (Q3-Q1) and is a measure of spread that is more resistant to outliers than the range (IQR = Q3-Q1). IQR is used as a measure of spread for skewed distributions (along with median for center).134620034036010305254020101530201520851565156060404500103052540201015302015208515651560604045Example 1: People say that it takes a long time to get to work in New York State due to the heavy traffic near big cities. What do the data say? Here are the travel times in minutes of 20 randomly chosen New York workers:Make a stemplot of the data. Be sure to include a key.Find and interpret the median of the travel times.Find and interpret the IQR of travel times.Find the mean of the travel times. How does the mean compare to the median? What does this confirm for you about the shape of the distribution of travel times?UNUSUAL FEATURES: Identifying OutliersAn observation is called an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.upper bound = Q3 + 1.5 (IQR)lower bound = Q1 - 1.5 (IQR)Example 2: Determine if the distribution of travel times from Example 1 has an outlier. Show your calculations and justify your answer.The Five-Number Summary and BoxplotsThe five-number summary of a set of observations consists of the minimum, first quartile (Q1), the median M (Q2), third quartile (Q3), and the maximum written in order from smallest to largest.MinQ1MQ3MaxThese five numbers divide each distribution roughly into quarters. About 25% of the data values fall between the minimum and Q1, about 25% are between Q1 and the median, about 25% are between the median and Q3, and about 25% are between Q3 and the maximum. This five-number summary leads us to a new graph, the boxplot (aka “box and whisker plot”).A central box spans the quartiles Q1 and Q3.A line in the box marks the median, M.Lines extend from the box out to the smallest and largest observations.Example 3: Create a box and whisker plot of the distribution of travel times from Example 1.Example 4: The 2009 roster of the Dallas Cowboys professional football team included 10 offensive linemen. Their weights (in pounds) were318353313318326307317311311Find the five-number summary for these data using the calculator.Calculate the IQR. Interpret this value in context.Determine whether there are any outliers using the 1.5 x IQR rule.Draw a box plot of the data.SPREAD: Standard DeviationThe standard deviation measures the average distance of the observations (data points) from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance. Standard deviation should be used as the measure of spread for symmetric distributions (along with mean for center).From the AP Formula Sheet: sx =1n-1Σ(xi-x)2Variance (not on the AP Formula Sheet) = sx2Example 5: The heights (in inches) of the five starters on a basketball team are 67, 72, 76, 76, and 84.Find and interpret the mean.Make a table that shows, for each value, its deviation from the mean and its squared deviation from the mean.Show how to calculate the variance and standard deviation from the values in your table.Interpret the meaning of the standard deviation in this setting.Use your calculator to confirm the standard deviation of the heights. What is the difference between σ and s?Choosing Measures of Center and Spread: We now have a choice between two descriptions of the center and spread of a distribution: the median and IQR, OR mean and standard deviation. So, how do we know which one to use?The median and IQR work for everything! You definitely want to use them when describing a skewed distribution since the median and IQR are resistant to outliers.Use the mean and standard deviation only for reasonable symmetric distributions that don’t have outliers.431800511810Males:1274428830678521373202142811Females: 11220310254379305179241276541272986130000Males:1274428830678521373202142811Females: 112203102543793051792412765412729861300Example 6: For their final project, a group of AP Statistics students investigated their belief that females text more than males. They asked a random sample of students from their school to record the number of text messages sent and received over a two-day period. Here are their data:What conclusion should the students draw? Give appropriate evidence to support your answer. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download