STATISTICS



Old Business

• Picking up pace

• Mac issues

Topics to Cover

• Frequency distributions/histograms in Mac Excel

• Review last homework

• Central tendency: mean, median, mode

• Percentiles, quartiles,

• Descriptive statistics in JMP

• Measures of Dispersion: the Variance

• Homework assignment

1. Frequency Distributions/Histograms with Mac

Mac only: Status of Excel Data Analysis ToolPak

• Not available for Mac as of 2008

• Alternative: StatPlus:MacLE

• Download and install:

• Might not make frequency distributions (so see below)

Frequency Distributions/Histograms via Excel FREQUENCY() function

• Place data in one column (e.g., a1:a10)

• Place bins in another column (e.g., b1:b4)

• In another column, select vertical range of blank cells, which contains one more than number of cells in bin array (e.g., c1:c5)

• Type formula: frequency(a1:a10, b1:b4), then press COMMAND+ENTER (Mac) or CONTROL+SHIFT+ENTER (PC)

• Note: double-check bin range address; formula editor may obscure first cell.

Explained here:

Class demonstration: Female Dover sole lengths

2. Review Last Homework

1. For males and females separately, make a distribution table:

In separate columns show range, bin, frequency, percentage, cumulative percentage.

Label columns, include units of mg/dL.

2. For females only, make histogram:

• Resize histogram to look nice

• Place legend on bottom (not side)

• Label x-axis: Upper Limit of Range (Cholesterol, mg/dL)

• Add chart title: Distribution of Cholesterol for Females (mg/dL)

• If necessary fix secondary y-axis to range from 0 to 100%

3. Make final report comparing male and female distributions:

Col 1: Range

Col 2: Male frequency

Col 3: Male percentage

Col 4: Female frequency

Col 5: Female percentage

Remember to save your worksheet.

4. Based on the results, what conclusions can you reach concerning differences between male and female patients?

Place results of 1. (male and female), histogram (female), final comparison table, and answer to question into Word document.

Using JMP: Enter data for females into Date Table; produce histogram & basic statistics; cut-and-paste results into the same Word document as above.

3. Measures of Central Tendency: Mean, Median, Mode,

Review: Watch Khan Academy video on Average, Median, Mode



The Arithmetic Mean

[pic] [pic]

Ex: The data represent the number of textbooks purchased by a sample of seven students:

10 4 7 5 7 8 9

[pic] = [pic] = [pic] = 7.14

Excel AVERAGE() FUNCTION

[pic]

The mean is affected by any outliers and skews. Because the mean is nonresistant, there are alternative measures that are more resistant to outliers and skews

The Median

The median is a resistant measure of central tendency that occupies the middle position of data placed in order of magnitude.

If n is odd, the median is the middle number of the data placed in order of magnitude. It occupies the [pic] position.

If n is even, the median is the average of the middle two numbers of the data placed in order of magnitude. It is the average of the numbers in the [pic] and [pic] positions.

Ex Reordering the sample of books: 4 5 7 7 8 9 10.

The median is 7. If there were an eighth person who purchased 12 books, the median would be 7.5.

Excel MEDIAN() FUNCTION

[pic]

The Mode

The mode, by definition, is the most frequently occurring value in a series.

There can be more than one modes

There can be no mode

Excel MODE() FUNCTION

[pic]

4. Percentiles and Quartiles

The kth percentile, Pk, is such that no more than k percent of the data are less than Pk and no more than (100 - k) percent are greater than Pk. Usually used with large data sets.

The first quartile (Q1) is the point that separates the lower 25 percent of the values from the upper 75 percent = value corresponding to the [pic] ordered observation.

The third quartile (Q3) is the point that separates the upper 25 percent of the values from the lower 75 percent = value corresponding to the [pic] ordered observation.

Ex Books: 4 5 7 7 8 9 10.

[pic] = 2, so Q1 = 5; [pic] = 6, so Q3 = 9.

(If position = #.5, average two nearest values; else, if not integer, round.)

5. Descriptive Statistics in JMP

Method 1: Distribution Function

Enter data into a Data Table (Important: do not mix character and numerical values in a column!)

Highlight column (takes some practice; hint: to refresh selection: Rows > Clear Row States

Analyze > Distribution > OK

[pic]

More statistics available by clicking red arrow beside Summary Statistics

Method 2: Summary Function

Tables > Summary

JMP Summary Statistics Menu

[pic]

From Statistics drop-down menu (see above), select statistics one at a time. Selected statistics will then appear in box to right. (Note: drop-down menu does not appear in picture below)

[pic]

For Q1 and Q3, choose Quantile statistic twice, specifying 25% and 75% in this box:

[pic]

Click: OK

[pic]

For more info:

6. Measures of Dispersion: the Variance

Range

Range = Maximum - Minimum

Ex Books: 4 5 7 7 8 9 10 Range = 10 - 4 = 6

Interquartile Range

IQR = Q3 - Q1

Ex The sample of books: Q1 = 5, Q3 = 9, IQR = 9 - 5 = 4

Variance (Population and Sample)

The variance is the average squared distance of observations from the mean.

Population variance formula:

[pic]

The square root of the variance is the standard deviation.

Spreadsheet calculation of population variance:

Ex Books: 4 5 7 7 8 9 10

[pic]

Variance = Average[X – mu]^2 =26.857/7 = 3.84

Video: Variance of a Population



7. Homework

Read pp. 104-117, Prob 3.1, 3.2a [skip(4),(6), (10)], 3.2b

Data for 3.b (bolts.xls) on course website

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download