Stat 101 Exam 1

Stat 101 Exam 1

Important Formulas and Concepts 1

1 Chapter 1

1.1 Definitions

1. Data Any collection of numbers, characters, images, or other items that provide information about something.

2. Categorical/Qualitative Variables Name categories for grouping.

3. Quantitative Variables When a variable contains measured numerical values with measurement units.

4. Identifier Variable Each record has a unique value like Student ID or SSN.

5. Frequency Table Records to totals and uses the category names to label each row.

6. Relative Frequency Table Displays percentages of the values in each category.

7. Bar Chart Displays the distribution of a categorical variable, showing counts for each category next to each other for easy comparison.

8. Relative Frequency Bar Chart Same as a bar chart but displays the percentage of people in each category rather than the counts.

9. Pie Charts Shows a whole group of cases as a circle. The circle is sliced into pieces whose size is proportional to the fraction of the whole in each category.

10. Distribution Slices up all the possible values of the variable into equal width bins and gives the number of values (or counts) falling into each bin.

11. Histogram Uses adjacent bars to show the distribution of a quantitative variable. Each bar shows the frequency of values falling into each bin.

1This version: February 3, 2020, by Dale Embers. May not include all things that could possibly be tested on. To be used as an additional reference to studying all Chapters 1 - 6.

12. Unimodal Histogram with one peak.

13. Bi-modal Histogram with two peaks.

14. Uniform Histogram that doesn't appear to have any mode. Bars are approximately the same height for each bin.

15. Symmetric Histogram in which the two halves on either side of the center look approximately like mirror images.

16. Skew Histogram that is not symmetric.

17. Skew Left Histogram with a long tail on the left.

18. Skew Right Histogram with a long tail on the right.

2 Chapter 2

2.1 Definitions

1. 5 Number summary- Min Q1 Median Q3 Max

2. Boxplot: Displays the 5 number summary as a central box with whiskers that extend to the nonoutlying data values.

3. Use sample mean and sample standard deviation when the data is symmetric and has no significiant outlier. Use the median and IQR when the data is skewed or has signficant outliers.

2.2 Formulas

1. Median = Once the data is ordered from smallest to largest, it is the middle value in the data. Divides the histogram into 2 equal pieces. x

2. Mean = Average of all of the values = x? = n

3. Range = Max - Min

4. Q1 = Median of the lower half of the data

5. Q3 = Median of the upper half of the data

6. IQR = Q3 - Q1 7. Variance: s2 = (x - x?)2

n-1

8. Standard Deviation: s = s2 (x - x?)2

= n-1

9. Upper Fence for Boxplot = Q3 + 1.5IQR

10. Lower Fence for Boxplot = Q1 - 1.5IQR

3 Chapter 3

3.1 Definitions

1. z-score Tells how many standard deviations a value is from the mean. Regardless of direction, the farther a data vlaue is from the mean the more unusual it is.

2. Standard Normal Model A Normal Model N(?, ) with mean 0 and standard deviation 1.

3. 68-95-99.7 Rule In a normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean; about 99.7% of values fall within 3 standard deviations of the mean.

4. Normal Percentile The normal percentile corresponding to a z-score gives the percentage of values in a standard normal distribution found at that z-score or below. Compared to area under the curve. See normal table in the textbook.

3.2 Formulas

1. z-score:

x-? z=

3.3 Properties about the area under a Normal Curve

1. The total area is 100%

2. The mean is the center of a normal curve

3. A Standard Normal Curve has mean = 0 and standard deviation = 1

4. When a normal curve is split in half from the mean, each side contains 50% of the area

5. The normal curve is symmetric

6. If a normal curve is split with 30% of the area on one side, the other side of the curve is 70% of the area

7. If a normal curve has 60% of the area in the middle, the remaining portions are a total of 40%. This 40% is allocated half to each side. So the far left has 20% of the area, the middle is 60% of the area, and the far right side has 20% of the area.

Textbook Normal Table Note: These tables give the percentage to the left of the z value.

4 Chapter 4

4.1 Definitions

1. Scatterplot Shows the relationship between 2 quantitative variables.

2. Direction of Scatterplot Positive direction means as one variable increases so does the other. Decreasing direction means the association is negative.

3. Form of Scatterplot Is it in a straight line or some other form?

4. Strength of Scatterplot Strong association if there is little scatter around the underlying relationship.

5. Outlier A point that does not fit the overall pattern seen in the scatterplot.

4.2 Formulas

1.

z-Scores

for

a

Scatterplot

zx

=

x-x? sx

zy

=

y-y? sy

2. Correlation Coefficient

r=

zx zy n-1

= [(x-x?)(y-y?)] (n-1)sxsy

4.3 Properties of Correlation Coefficient

1. Always between -1 and 1

2. Does not matter which variable you consider as x and y.

3. Treats x and y symmetrically

4. No units

5. Not affected by changes in scale or if the variables are standardized (changing to zscore)

6. Depends only on z-scores

7. Does NOT prove causation. Only provides a relationship between 2 variables.

5 Chapter 5

5.1 Definitions

1. Linear Model Equation of the form y^ = a + bX. y^ means estimated values for y.

2. Predicted Values Value of y^ found for a given x-value in the data.

3. Fitted Linear Model y^ = a + bX

4. Residual Differences between data values and the corresponding values predicted by the model (observed - expected)

5. R2 Gives the fraction of variability of y accounted for by the least squares linear regression on x. It is an overall measure of how successful the regression is in linearly relating y to x.

6. Least Squares Criterion Specifies the unique line that minimizes the variance of the residuals or the sum of squared residuals.

7. Extrapolation In any regression situation it is unsafe. Predictions from extrapolation should not be trusted.

8. Influential Point A point that ,if omitted from the data, results in a very different regression model.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download