8. Goodness of fit in regression.

1. Two-way tables. 2. Histograms. 3. mean, median, IQR, z score. 4. skew. 5. Boxplots. 6. Scatterplots and correlation. 7. Regression. 8. Goodness of fit in regression. 9. Common regression pitfalls.

Simple data summaries

? For categorical data, two-way tables can be useful.

? For quantitative data, histograms are useful.

? For a relative frequency histogram, the percentage of people in the bin is shown rather than the whole number.

? Here, n = 25. 0.2 = 20% of people in the sample had 3 quarts. The number of people with 3 quarts was 0.2 x 25 = 5.

? The sizes of the bins can be adjusted and the look of the histogram can be influenced by the bin sizes.

? With histograms, look for symmetry, skew, bimodality, and outliers.

? The range = maximum observed value ? minimum. ? For roughly symmetric data, the mean and sd are

good summaries of the center and spread. ? When the data are skewed or there are serious

outliers, the median and the IQR can be preferable.

3. mean, median, IQR, z score.

? The median is the middle in the sorted list of values. It is a value M where 50% of the observations are M. Different software use different conventions, but we will use the convention that, if there is a range of possible medians, you take the middle of that range.

? For example, suppose data are 1, 3, 7, 7, 8, 9, 12, 14. ? M = 7.5.

? Suppose 25% of the observations lie below a certain value x. Then x is called the lower quartile (or 25th percentile).

? Similarly, if 25% of the observations are greater than x, then x is called the upper quartile (or 75th percentile).

? The lower quartile can be calculated by finding the median M, and then determining the median of the values below M. Similarly the upper quartile is the median of the values greater than M.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download