Benedictine University



[pic]

Benedictine University

Informing today – Transforming tomorrow

SYLLABUS

Course: MGT 150–Business Statistics I–Spring, 2015

Instructor: Jeffrey M. Madura

B.A. University of Notre Dame, 1967

M.B.A. Northwestern University, 1971

C.P.A. State of Illinois, 1979

Phone: 630-829-6467

Email: jmadura@ben.edu

Website: ben.edu/faculty/jmadura/home.htm

Text: Modern Business Statistics with Microsoft Office Excel, 5th edition, Anderson, Sweeney & Williams, South-Western/Cengage, 2015.

ISBN: 978-1-285-43330-1 (hard cover)

Other Required: Aplia interactive learning/assignment system.

TI-83 or TI-84 calculator.

Course Objectives:

The course addresses the following formal College of Business Program Objectives:

Students in this program will receive a thorough grounding in: Mathematics and

Statistics.

This course emphasizes the following IDEA objectives:

Learning fundamental principles, generalizations, or theories.

Learning to apply course material to improve thinking, problem-solving, and

decision-making.

Developing specific skills, competencies and points of view needed by professionals

in the fields most closely related to this course.

Course Description: (from the Catalog)

Basic course in statistical technique, includes measures of central tendency, variability, probability theory,

sampling, estimation and hypothesis testing. Prerequisite: MATH 105 or MATH 110. Three semester hours.

This is a course in introductory statistics. The orientation is toward applications and problem-solving, not mathematical theory. The instructor intends that students gain an appreciation for the usefulness of statistical methods in analyzing data commonly encountered in business and the social and natural sciences. The course is a framework within which students may learn the subject matter. This framework consists of a program of study, opportunity for questions/discussion, explanation, and evaluative activities (quizzes). The major topics are:

o Data and Statistics

o Descriptive Statistics: Tabular and Graphical Presentations

o Descriptive Statistics: Numerical Measures

o Introduction to Probability

o Discrete Probability Distributions

o Continuous Probability Distributions

o Sampling and Sampling Distributions

o Interval Estimation, Means and Proportions

o Hypothesis Tests, Means and Proportions

Quizzes and Grades: The course is divided into five three-week parts, with a quiz at the end

of each part. Dates are subject to change.

Quiz 1 Feb. 5

Quiz 2 Feb. 26 Quizzes will constitute 2/3 of your grade.

Quiz 3 Mar. 26 The other 1/3 will be your score on assignments,

Quiz 4 Apr. 16 Class participation may also be a factor.

Quiz 5 Finals Week

Grade requirements: A--90%, B--80%, C--60%, D--50%.

There may also be other assignments requiring analysis of data using Excel, and there may be a term project, with weight equal to one quiz.

It is the responsibility of any student who is unsure of the grading scale, course requirements,

or anything else in this course outline to ask the instructor for clarification.

Homework Assignments: There will be 10 Aplia homework assignments. Due dates are listed

in the Aplia system.

The assignments will constitute 1/3 of the course grade. To accommodate the occasional instance when you cannot meet an Aplia deadline, the lowest assignment will be dropped. Assignments will be handled by Aplia. You must access the Aplia website, which means you must register for an account at: .

Please register within 24 hours of the first class meeting.

The computer is absolutely unforgiving about accepting late assignments. Time is kept at Aplia, and not by the computer you are working on. You may appeal grading decisions made by the computer, if you can demonstrate that an error has been made.

Faculty members have observed that the worst thing some students do in a course is not think about course material every day. They sometimes let weeks go by and then try to learn all the material in one or two days. This usually does not work. The weekly assignments will require keeping up-to-date.

Calculators: Calculators will be required for the computational portion of each quiz. Bring your calculator to every class and verify each computation performed. The TI-83 is the standard for this course.

Recommended Exercises: Students should work as many as possible of the even-numbered exercises in the text. Proficiency gained from practice on these will help when similar problems appear on quizzes. Answers to even-numbered exercises are at the back of the text.

Assignments: Non-Aplia assignments must be turned in during class on the day they are due. Assignments turned in after this time but before the assignment is handed back may receive one-half credit. Assignments turned in after the hand-back can no longer be accepted for credit.

Attendance: Attendance will be taken occasionally and randomly. Frequent absences will be noticed, and they will have an adverse impact on quiz performance and your final grade. Two or more absences on days when quizzes are handed back will lower your grade by one letter grade.

Missed Quizzes: Make-up quizzes will be given only if a quiz was missed for a good and documented reason. If a make-up is given. The quiz score will be reduced 20% in an effort to maintain some degree of fairness to those who took the quiz at the proper time.

Use of Class Time: Come to class prepared to discuss the material assigned, and to contribute

to the solution of assigned problems.

Special Needs: If you have a documented learning, psychological, or physical disability, you may be eligible for reasonable academic accommodations or services.  To request accommodations or services, please contact Jennifer Rigor in the Student Success Center, 015 Krasa Student Center, 630-829-6512.  All students are expected to fulfill essential course requirements.  The University will not waive any essential skill or requirement of a course or degree program.

Academic Honesty: The search for truth and the dissemination of knowledge are the central mission of a university. Benedictine University pursues these missions in an environment guided by our Roman Catholic tradition and our Benedictine heritage. Integrity and honesty are therefore expected of all members of the community, including students, faculty members, administration, and staff. Actions such as cheating, plagiarism, collusion, fabrication, forgery, falsification, destructions, multiple submission, solicitation, and misrepresentation, are violations of these expectations and constitute unacceptable behavior in the University community. The penalties for such actions can range from a private verbal warning, all the way to expulsion from the University. The University’s Academic Honesty Policy is available at .

In this course, academic honesty is expected of all class participants. If your name is on the work submitted, it is expected that you alone did the work. For example, in terms of quizzes, this means that copying from another paper, unauthorized collaboration of any sort, or the use of “cribs” of any kind is a breach of academic honesty. The penalties for a breach of academic honesty in this course are (1) a zero for the assignment or quiz for the first offense, and (2) an “F” for the course for a subsequent offense by the same person(s).

Exception: Activities in the course that are designated as "group work."

Electronic Devices: Bring your TI83/84 to every class. Turn off or mute your phone before class. Using your laptop or tablet to follow class examples in Excel is encouraged, but only the TI calculator is permitted for quizzes.

Feel free to see me if there is anything else of concern to you. Your comments about this course or any course are always welcome and appreciated. The student is responsible for the information in the syllabus and should ask for clarification for anything in the syllabus about which they are unsure.

Other Grading Policies:

Students on Academic Probation are not eligible for a grade of (I) incomplete.

Students who are not enrolled in class (either for credit or audit) cannot attend the

class and cannot receive credit for the course.

Students cannot submit additional work after grades have been submitted to alter

their grade (except in cases of temporary grades such as I, X, IP).

Make up exams or assignments must be completed within one week of the schedule due date. Failure to attend class does not excuse the student from meeting deadlines for assigned work. Any student who is unsure of the grading scale or course requirements is responsible for clarifying questions with the instructor.

Essential Ideas, Terminology, Skills/Procedures, and Concepts for Each Part of the Course

Part I

Two Types of Statistics: Descriptive and Inferential

Descriptive Statistics--purpose: to communicate characteristics of a set of data

Characteristics: Mean, median, mode, variance, standard deviation, skewness, etc.

Charts, graphs

Inferential Statistics--purpose: to make statements about population parameters based on sample statistics

Population--group of interest being studied; often too large to sample every member

Sample--subset of the population; must be representative of the population

Random sampling is a popular way of obtaining a representative sample.

Parameter--a characteristic of a population, usually unknown, often can be estimated

Population mean, population variance, population proportion, etc.

Statistic--a characteristic of a sample

Sample mean, sample variance, sample proportion, etc.

Two ways of conducting inferential statistics

Estimation

Point estimate--single number estimate of a population parameter, no recognition of uncertainty

such as: "40" to estimate the average age of the voting population

Interval estimation--point estimate with an error factor, as in: "40 ± 5"

The error factor provides formal and quantitative recognition of uncertainty.

Confidence level (confidence coefficient)--the probability that the parameter being

estimated actually is in the stated range

Hypothesis testing

Null hypothesis--an idea about an unknown population parameter, such as: "In the population,

the correlation between smoking and lung cancer is zero."

Alternate hypothesis--the opposite idea about the unknown population parameter, such

as: "In the population, the correlation between smoking and lung cancer is not zero."

Data are gathered to see which hypothesis is supported. The result is either rejection

or non-rejection (acceptance) of the null hypothesis.

Four types of data

Nominal

Names, labels, categories (e.g. cat, dog, bird, rabbit, ferret, gerbil)

Ordinal

Suggests order, but computations on the data are impossible or meaningless (e.g. Pets can be listed in order of popularity--1-cat, 2-dog, 3-bird, etc.--but the difference between cat and dog is not related to the difference between dog and bird.)

Interval

Differences are meaningful, but they are not ratios. There is no natural zero point (e.g. clock time-- the difference between noon and 1 p.m. is the same amount of time as the difference between 1 p.m. and 2 p.m. But 2 p.m. is not twice as late as 1 p.m. unless you define the starting point of time as noon, thereby creating a ratio scale)

Ratio

Differences and ratios are both meaningful; there is a natural zero point. (e.g. Length--8 feet is twice as long as 4 feet, and 0 feet actually does mean no length at all.)

Two types of statistical studies

Observational study (naturalistic observation)

Researcher cannot control the variables under study; they must be taken as they are found (e.g. most research in astronomy).

Experiment

Researcher can manipulate the variables under study (e.g. drug dosage).

Characteristics of Data

Central tendency--attempt to find a "representative" or "typical" value

Mean--the sum of the data items divided by the number of items, or Σx / n

More sensitive to outliers than the median

Outlier--data item far from the typical data item

Median--the middle item when the items are ordered high-to-low or low-to-high

Also called the 50th percentile

Less sensitive to outliers than the mean

Mode--most-frequently-occurring item in a data set

Dispersion (variation or variability)--the opposite of consistency

Variance--the Mean of the Squared Deviations (MSD), or Σ(x-xbar)2/n

Deviation--difference between a data item and the mean

The sum of the deviations in any data set is always equal to zero.

Standard Deviation--square root of the variance

Range--difference between the highest and lowest value in a data set

Coefficient of Variation—measures relative dispersion—CV = ssd / x-bar (or est. σ / μ)

Skewness--the opposite of symmetry

Positive skewness--mean exceeds median, high outliers

Negative skewness--mean less than median, low outliers

Symmetry--mean, median, mode, and midrange about the same

Kurtosis--degree of relative concentration or peakedness

Leptokurtic--distribution strongly peaked

Mesokurtic--distribution moderately peaked

Platykurtic--distribution weakly peaked

Symbols & "Formula Sheet No. 1"

Descriptive statistics

Sample Mean--"xbar" (x with a bar above it)

Sample Variance--"svar" (the same as MSD for the sample)

Also, the "mean of the squares less the square of the mean"

Sample Standard Deviation--"ssd"--square root of svar

Population parameters (usually unknown, but can be estimated)

Population Mean--"μ" (mu)

Population Variance--"σ2" (sigma squared) (MSD for the population)

Population Standard Deviation--"σ" (sigma)--square root of σ2

Inferential statistics--estimating of population parameters based on sample statistics

Estimated Population Mean--"μ^" (mu hat)

The sample mean is an unbiased estimator of the population mean.

Unbiased estimator--just as likely to be greater than as less than the parameter being estimated

If every possible sample of size n is selected from a population, as many sample

means will be above as will be below the population mean.

Estimated Population Variance--"σ^2" (sigma hat squared)

The sample variance is a biased estimator of the population variance.

Biased estimator--not just as likely to be greater than as less than the parameter

being estimated

If every possible sample of size n is selected from a population, more of the sample

variances will be below than will be above the population variance.

The reason for this bias is the probable absence of outliers in the sample.

The variance is greatly affected by outliers.

The smaller a sample is, the less likely it is to contain outliers.

Note how the correction factor's [n / (n-1) ] impact increases as the sample size decreases.

This quantity is also widely referred to as "s2" and is widely referred to as the "sample variance."

In this context "sample variance" does not mean variance of the sample; it is, rather, a shortening

of the cumbersome phrase "estimate of population variance computed from a sample."

Estimated Population Standard Deviation--"σ^" (sigma hat)--square root of σ^2

The bias considerations that apply to the estimated population variance also apply to the estimated population standard deviation.

This quantity is also widely referred to as "s", and is widely referred to as the

"sample standard deviation."

In this context "sample standard deviation" does not mean standard deviation of the sample;

it is, rather, a shortening of the cumbersome phrase "estimate of population standard

deviation computed from a sample."

Calculator note--some calculators, notably TI's, compute two standard deviations

The smaller of the two is the one we call "ssd"

TI calculator manuals call this the "population standard deviation."

This refers to the special case in which the entire population is included in the sample;

then the sample standard deviation (ssd) and the population standard deviation are the same.

(This also applies to means and variances.) There is no need for inferential statistics in such cases.

The larger of the two is the one we call σ^ (sigma-hat) (estimated population standard deviation).

TI calculator manuals call this the "sample standard deviation."

This refers to the more common case in which "sample standard deviation" really means estimated

population standard deviation, computed from a sample.

Significance of the Standard Deviation

Normal distribution (empirical rule)--empirical: derived from experience

Two major characteristics: symmetry and center concentration

Two parameters: mean and standard deviation

"Parameter," in this context, means a defining characteristic of a distribution.

Mean and median are identical (due to symmetry) and are at the high point.

Standard deviation--distance from mean to inflection point

Inflection point--the point where the second derivative of the normal curve is equal to zero,

or, the point where the curvature changes from "right" to "left" (or vice-versa), as when

you momentarily travel straight on an S-curve on the highway

z-value--distance from mean, measured in standard deviations

Areas under the normal curve can be computed using integral calculus.

Total area under the curve is taken to be 1.000 or 100%

Tables enable easy determination of these areas.

about 68-1/4%, 95-1/2%, and 99-3/4% of the area under a normal curve lie within

one, two, and three standard deviations from the mean, respectively

Many natural and economic phenomena are normally distributed.

Tchebyshev's Theorem (or Chebysheff P. F., 1821-1894)

What if a distribution is not normal? Can any statements be made as to what percentage of the area lies

within various distances (z-values) of the mean?

Tchebysheff proved that certain minimum percentages of the area must lie within various

z-values of the mean.

The minimum percentage for a given z-value, stated as a fraction, is [ (z2-1) / z2 ]

Tchebysheff's Theorem is valid for all distributions.

Other measures of relative standing

Percentiles--A percentile is the percentage of a data set that is below a specified value.

Percentile values divide a data set into 100 parts, each with the same number of items.

The median is the 50th percentile value.

Z-values can be converted into percentiles and vice-versa.

A z-value of +1.00, for example, corresponds to the 84.13 percentile.

The 95th percentile, for example, corresponds to a z-value of +1.645.

A z-value of 0.00 is the 50th percentile, the median.

Deciles

Decile values divide a data set into 10 parts, each with the same number of items.

The median is the 5th decile value.

The 9th decile value, for example, separates the upper 10% of the data set from the

lower 90%. (Some would call this the 1st decile value.)

Quartiles

Quartile values divide a data set into 4 parts, each with the same number of items.

The median is the 2nd quartile value.

The 3rd quartile value (Q3), for example, separates the upper 25% of the data set from the lower 75%.

Q3 is the median of the upper half; Q1 (lower quartile) is the median of the lower half

Other possibilities: quintiles (5 parts), stanines (9 parts)

Some ambiguity in usage exists, especially regarding quartiles--For example, the phrase "first quartile" could mean one of two things: (1) It could refer to the value that separates the lower 25% of the data set from the upper 75%, or (2) It could refer to the members, as a group, of the lower 25% of the data.

Example (1): "The first quartile score on this test was 60."

Example (2): "Your score was 55, putting you in the first quartile."

Also the phrase "first quartile" is used by some to mean the 25th percentile value, and by others to mean the 75th percentile value. To avoid this ambiguity, the phrases "lower quartile," "middle quartile," and "upper quartile" may be used.

Terminology

Statistics, population, sample, parameter, statistic, qualitative data, quantitative data, discrete data, continuous data, nominal measurements, ordinal measurements, interval measurements, ratio measurements, observational study (naturalistic observation), experiment, precision, accuracy, sampling, random sampling, stratified sampling, systematic sampling, cluster sampling, convenience sampling, representativeness, inferential statistics, descriptive statistics, estimation, point estimation, interval estimation, hypothesis testing, dependency, central tendency, dispersion, skewness, kurtosis, leptokurtic, mesokurtic, platykurtic, frequency table, mutually exclusive, collectively exhaustive, relative frequencies, cumulative frequency, histogram, Pareto chart, bell-shaped distribution, uniform distribution, skewed distribution, pie chart, pictogram, mean, median, mode, bimodal, midrange, reliability, symmetry, skewness, positive skewness, negative skewness, range, MSD, variance, deviation, standard deviation, z-value, Chebyshev's theorem, empirical rule, normal distribution, quartiles, quintiles, deciles, percentiles, interquartile range, stem-and-leaf plot, boxplot, biased, unbiased.

Skills/Procedures--given appropriate data, compute or identify the

Sample mean, median, mode, variance, standard deviation, and range

Estimated population mean, variance, and standard deviation

Kind of skewness, if any, present in the data set

z-value of any data item

Upper, middle, and lower quartiles

Percentile of any data item

Percentile of any integer z-value from -3 to +3

Concepts

Identify circumstances under which the median is a more suitable measure of central tendency than the

mean

Explain when the normal distribution (empirical rule) may be used

Explain when Chebyshev's Theorem may be used; when it should be used

Give an example (create a data set) in which the mode fails as a measure of central tendency

Give an example (create a data set) in which the mean fails as a measure of central tendency

Explain why the sum of the deviations fails as a measure of dispersion, and describe how this failure is

overcome

Distinguish between unbiased and biased estimators of population parameters

Describe how percentile scores are determined on standardized tests like the SAT or the ACT

Explain why the variance and standard deviation of a sample are likely to be lower than the variance and standard deviation of the population from which the sample was taken

Identify when the sample mean, variance, and standard deviation are identical to the population mean,

variance, and standard deviation

Part II

Basic Probability Concepts

Probability--the likelihood of an event

Probability is expressed as a decimal or fraction between zero and one, inclusive.

An event that is certain has a probability of 1.

An event that is impossible has a probability of 0.

If the probability of rain today (R) is 30%, it can be written P(R) = 0.3.

Objective probabilities--calculated from data according to generally-accepted methods

Relative frequency method--example: In a class of 25 college students there are 14 seniors.

If a student is selected at random from the class, the probability of selecting a senior is

14/25 or 0.56. Relative to the number in the class, 25, the number of seniors

(frequency), 14, is 56% or 0.56.

Subjective probabilities--arrived at through judgment, experience, estimation, educated guessing,

intuition, etc. There may be as many different results as there are people making the estimate.

(With objective probability, all should get the same answer.)

Boolean operations--Boolean algebra--(George Boole, 1815-1864)

Used to express various logical relationships; taught as "symbolic logic" in college philosophy and

mathematics departments; important in computer design

Complementation--translated by the word "not"--symbol: A¯or A-bar

Complementary events are commonly known as "opposites."

Examples: Heads/Tails on a coin-flip; Rain/No Rain on a particular day; On Time/Late for work

Complementary events have two properties

Mutually exclusive--they cannot occur together; each excludes the other

Collectively exhaustive--there are no other outcomes; the two events are a complete or

exhaustive list of the possibilities

Partition--a set of more than two events that are mutually exclusive and collectively exhaustive

Examples: A, B, C, D, F, W, I--grades received at the end of a course; Freshman, Sophomore, Junior, Senior--traditional college student categories

The sum of the probabilities of complementary events, or of the probabilities of all the events

in a partition is 1.

Intersection--translated by the words "and," "with," or "but"--symbol: ( or, for typing convenience, n

A day that is cool (C) and rainy (R) can be designated (CnR).

If there is a 25% chance that today will be cool (C) and rainy (R), it can be written P(CnR) = 0.25.

Intersections are often expressed without using the word "and."

Examples: "Today might be cool with rain." or "It may be a cool, rainy day."

Two formulas for intersections:

For any two events A and B: P(AnB) = P(A|B)*P(B) ("|" is defined below.)

For independent events A and B: P(AnB) = P(A)*P(B)

This will appear later as a test for independence.

This formula may be extended to any number of independent events

P(AnBnCn . . . nZ) = P(A)*P(B)*P(C)* . . . P(Z)

The intersection operation has the commutative property

P(AnB) = P(BnA)

"Commutative" is related to the word "commute" which means "to switch."

The events can be switched without changing anything.

In our familiar algebra, addition and multiplication are commutative, but

subtraction and division are not.

Intersections are also called "joint (together) probabilities."

Union--translated by the word "or"--symbol: ( or, for typing convenience, u A day that is cool (C) or rainy (R) can be designated (CuR).

If there is a 25% chance that today will be cool (C) or rainy (R), it can be written P(CuR) = 0.25.

Unions always use the word "or."

Addition rule to compute unions: P(AuB) = P(A) + P(B) - P(AnB)

The deduction of P(AnB) eliminates the double counting that occurs when P(A) is added to P(B).

The union operation is commutative: P(AuB) = P(BuA)

Condition--translated by the word "given"--symbol: |

A day that is cool (C) given that it is rainy (R) can be designated (C|R).

The event R is called the condition.

If there is a 25% chance that today will be cool (C) given that it is rainy (R),

it can be written P(C|R) = 0.25.

Conditions are often expressed without using the word "given."

Examples: "The probability that it will be cool when it is rainy is 0.25." [P(C|R) = 0.25.]

"The probability that it will be cool if it is rainy is 0.25." [P(C|R) = 0.25.]

"25% of the rainy days are cool." [P(C|R) = 0.25.]

All three of the above statements are the same, but the next one is different:

"25% of the cool days are rainy." This one is P(R|C) = 0.25.

The condition operation is not commutative: P(A|B) ≠ P(B|A)

For example, it is easy to see that P(rain|clouds) is not the same as P(clouds|rain).

Conditional probability formula: P(A|B) = P(AnB) / P(B)

Occurrence Tables and Probability Tables

Occurrence table--table that shows the number of items in each category and

in the intersections of categories

Can be used to help compute probabilities of single events,

intersections, unions, and conditional probabilities

Probability table--created by dividing every entry in an occurrence table

by the total number of occurrences.

Probability tables contain marginal probabilities and joint probabilities.

Marginal probabilities--probabilities of single events, found in the right and bottom

margins of the table

Joint probabilities--probabilities of intersections, found in the interior part of the

table where the rows and columns intersect

Unions and conditional probabilities are not found directly in a probability table,

but they can be computed easily from values in the table.

Two conditional probabilities are complementary if they have the same condition and the events

before the "bar" (|) are complementary. For example, if warm (W) is the opposite of cool,

then (W|R) is the complement of (C|R), and P(W|R) + P(C|R) = 1.

In a 2 x 2 probability table, there are eight conditional probabilities, forming four pairs

of complementary conditional probabilities.

It is also possible for a set of conditional probabilities to constitute a partition

(if they all have the same condition, and the events before the "bar" are a partition).

Testing for Dependence/Independence

Statistical dependence

Events are statistically dependent if the occurrence of one event

affects the probability of the other event.

Identifying dependencies is one of the most important tasks of statistical analysis.

Tests for independence/dependence

Conditional probability test--posterior/prior test

Prior and posterior are the Latin words for "before" and "after."

A prior probability is one that is computed or estimated before additional information is obtained.

A posterior probability is one that is computed or estimated after additional information is obtained.

Prior probabilities are probabilities of single events, such as P(A).

Posterior probabilities are conditional probabilities, such as P(A|B).

Independence exists between any two events A and B if P(A|B) = P(A)

If P(A|B) = P(A), the occurrence of B has no effect on P(A)

If P(A|B) ≠ P(A), the occurrence of B does have an effect on P(A)

Positive dependence if P(A|B) > P(A) -- posterior greater than prior

Negative dependence if P(A|B) < P(A) -- posterior less than prior

Multiplicative test--joint/marginal test

Independence exists between any two events A and B if P(AnB) = P(A)*P(B)

Positive dependence if P(AnB) > P(A)*P(B) -- intersection greater than product

Negative dependence if P(AnB) < P(A)*P(B) -- intersection less than product

Bayesian Inference--Thomas Bayes (1702-1761)

Bayes developed a technique to compute a conditional probability,

given the reverse conditional probability

Computations are simplified, and complex formulas can often be avoided, if a probability table is used.

Basic computation is: P(A|B) = P(AnB) / P(B), an intersection probability divided by

a single-event probability. That is, a joint probability divided by a marginal probability.

Bayesian analysis is very important because most of the probabilities upon which we base decisions

are conditional probabilities.

Other Probability Topics:

Matching-birthday problem

Example of a "sequential" intersection probability computation, where each probability is revised slightly and complementary thinking is used

Complementary thinking--strategy of computing the complement (because it is easier) of what is

really needed, then subtracting from 1

Redundancy

Strategy of using back-ups to increase the probability of success

Usually employs complementary thinking and the extended multiplicative rule for independent events

to compute the probability of failure. P(Success) is then equal to 1 - P(Failure).

Permutations and Combinations

Permutation--a set of items in which the order is important

Without replacement--duplicate items are not permitted

With replacement--duplicate items are permitted

Combination--a set of items in which the order is not important

Without replacement--duplicate items are not permitted

With replacement--duplicate items are permitted

In the formulas, "n" designates the number of items available, from which "r" is the number that will be chosen.

(Can r ever exceed n?)

To apply the correct formula when confronting a problem, two decisions must be made:

Is order important or not? Are duplicates permitted or not?

Permutations, both with and without replacement, can be computed by using the "sequential" method

instead of the formula. This provides way of verifying the formula result.

Lotteries

Usually combination ("Lotto") or permutation ("Pick 3 or 4") problems

Lotto games are usually without replacement--duplicate numbers are not possible

Pick 3 or 4 games are usually with replacement--duplicate numbers are possible

Poker hands

Can be computed using combinations and the relative frequency method

Can also be computed sequentially

Terminology

PROBABILITY:

probability, experiment, event, simple event, compound event, sample space, relative frequency method, classical approach, law of large numbers, random sample, impossible event probability, certain event probability, complement, partition, subjective probability, occurrence table, probability table, addition rule for unions, mutually exclusive, collectively exhaustive, redundancy, multiplicative rule for intersections, tree diagram, statistical independence/dependence, conditional probability, Bayes' theorem, acceptance sampling, simulation, risk assessment, redundancy, Boolean algebra, complementation, intersection, union, condition, marginal probabilities, joint probabilities, prior probabilities, posterior probabilities, two tests for independence, triad, complementary thinking, commutative.

PERMUTATIONS AND COMBINATIONS:

permutations, permutations with replacement, sequential method, combinations, combinations with replacement.

Skills/Procedures--given appropriate data, prepare an occurrence table

PROBABILITY

prepare a probability table

compute the following 20 probabilities

4 marginal probabilities (single simple events)

4 joint probabilities (intersections)

4 unions

8 conditional probabilities--identify the 4 pairs of conditional complementary events

identify triads (one unconditional and two conditional probabilities in each triad)

conduct the conditional (prior/posterior) probability test for independence / dependence

conduct the multiplication (multiplicative) (joint/marginal) test for independence / dependence

identify positive / negative dependency

identify Bayesian questions

use the extended multiplicative rule to compute probabilities

use complementary thinking to compute probabilities

compute the probability of "success" when redundancy is used

compute permutations and combinations with and without replacement

Concepts

PROBABILITY

give an example of two or more events that are not mutually exclusive

give an example of two or more events that are not collectively exhaustive

give an example of a partition--a set of three or more events that are mutually exclusive and

collectively exhaustive

express the following in symbolic form using F for females and V for voters in a retirement community

60% of the residents are females

30% of the residents are female voters

50% of the females are voters

75% of the voters are female

70% of the residents are female or voters

30% of the residents are male non-voters

25% of the voters are male

40% of the residents are male

identify which two of the items above are a pair of complementary probabilities

identify which two of the items above are a pair of complementary conditional probabilities

from the items above, comment on the dependency relationship between F and V

if there are 100 residents, determine how many female voters there would be if gender and voting were independent

explain why joint probabilities are called "intersections"?

identify which two of our familiar arithmetic operations and which two Boolean operations are commutative

tell what Thomas Bayes is known for (not English muffins)

PERMUTATIONS AND COMBINATIONS:

give an example of a set of items that is a permutation

give an example of a set of items that is a combination tell if, in combinations/permutations,

"r" can ever exceed "n"

Part III

Permutations and Combinations (outline, etc. Repeated from Part II)

Permutation--a set of items in which the order is important

Without replacement--duplicate items are not permitted

With replacement--duplicate items are permitted

Combination--a set of items in which the order is not important

Without replacement--duplicate items are not permitted

With replacement--duplicate items are permitted

In the formulas, "n" designates the number of items available, from which "r" is the number that will be chosen.

(Can r ever exceed n?)

To apply the correct formula when confronting a problem, two decisions must be made:

Is order important or not? Are duplicates permitted or not?

Permutations, both with and without replacement, can be computed by using the "sequential" method

instead of the formula. This provides way of verifying the formula result.

Lotteries

Usually combination ("Lotto") or permutation ("Pick 3 or 4") problems

Lotto games are usually without replacement--duplicate numbers are not possible

Pick 3 or 4 games are usually with replacement--duplicate numbers are possible

Poker hands

Can be computed using combinations and the relative frequency method

Can also be computed sequentially

Terminology

PERMUTATIONS AND COMBINATIONS:

permutations, permutations with replacement, sequential method, combinations, combinations with replacement.

Skills/Procedures--given appropriate data,

PERMUTATIONS AND COMBINATIONS:

decide when order is and is not important

decide when selection is done with replacement and without replacement

compute permutations with and without replacement using the permutation formula

compute combinations with and without replacement using the combination formula

use the sequential method to compute permutations with and without replacement

solve various applications problems involving permutations and combinations

give an example of a set of items that is a permutation

give an example of a set of items that is a combination tell if, in combinations/permutations,

"r" can ever exceed "n"

Mathematical Expectation

Discrete variable--one that can assume only certain values (often the whole numbers)

There is only a finite countable number of values between any two specified values.

Examples: the number of people in a room, your score on a quiz in this course, shoe sizes (certain fractions permitted), hat sizes (certain fractions permitted)

Continuous variable--one that can take on any value--there is an infinite number of values between any two specified values

Examples: your weight (can be any value, and changes as you breathe), the length

of an object, the amount of time that passes between two events, the amount of water

in a container (but if you look at the water closely enough, you find that it is made up

of very tiny pieces--molecules--so this last example is really discrete

at the submicroscopic level, but in ordinary everyday terms we would call it continuous)

Mean (expected value) of a discrete probability distribution

Probability distribution--a set of outcomes and their likelihoods

Mean is the probability-weighted average of the outcomes

Each outcome is multiplied by its probability, and these are added.

The result is not an estimate. It is the actual population value, because the probability distribution

specifies an entire population of outcomes. ("μ" may be used, without the estimation caret above it.)

The mean need not be a possible outcome, and for this reason

the term "expected value" can be misleading.

Variance of a discrete probability distribution

Variance is the probability-weighted average of the squared deviations

similar to MSD, except it's a weighted average

Each squared deviation is multiplied by its probability, and these are added.

The result is not an estimate. It is the actual population value, because the probability distribution

specifies an entire population of outcomes. ("σ2" may be used, without the estimation caret above it.)

Standard deviation of a discrete probability distribution--the square root of the variance

("σ" may be used, without the estimation caret ^ above it.)

The Binomial Distribution

Binomial experiment requirements

Two possible outcomes on each trial

The two outcomes are (often inappropriately) referred to as "success" and "failure."

n identical trials

Independence from trial to trial--the outcome of one trial does not affect the outcome of any other trial

Constant p and q from trial to trial

p is the probability of the "success" event

q is the probability of the "failure" event; (q = (1-p) )

"x" is the number of "successes" out of the n trials.

Symmetry is present when p = q

When p < .5, the distribution is positively skewed (high outliers).

When p > .5, the distribution is negatively skewed (low outliers).

Binomial formula--for noncumulative probabilities

Cumulative binomial probabilities--computed by adding the noncumulative probabilities

Binomial probability tables--may show cumulative or noncumulative probabilities

If cumulative, compute noncumulative probabilities by subtraction

Parameters of the binomial distribution--n and p

Binomial formula: P(x) = n!/(x!(n-x)! * p^x * q^(n-x)

Note that when x=n, the formula reduces to p^n, and when x=0, the formula reduces to q^n.

These are just applications of the multiplicative rule for independent events.

The Normal Distribution

Normal distribution characteristics--center concentration and symmetry

Parameters of the normal distribution--μ (mu), mean; and σ (sigma), standard deviation

Z-value formula (four arrangements--for z, x, μ, and σ)

Normal distribution problems have three variables given, and the fourth must be

computed and interpreted.

Z-values determine areas (probabilities) and areas (probabilities) determine z-values--the normal

table converts from one to the other.

Normal distribution probability tables--our text table presents one-sided central areas

Two uses of the normal distribution

Normally-distributed phenomena

To approximate the binomial distribution--this application is far less important now that computers

and even small calculators can generate binomial probabilities

Binomial parameters (n and p) can be converted to normal parameters μ and σ

μ = np; σ2 = (npq); σ = ((npq)

Terminology

MATHEMATICAL EXPECTATION: random variable, discrete variable, continuous variable, probability distribution, probability histogram, mean of a probability distribution, variance and standard deviation of a probability distribution, probability-weighted average of outcomes (mean), probability-weighted average of squared deviations (variance).

BINOMIAL DISTRIBUTION: binomial experiment, requirements for a binomial experiment, independent trials, binomial probabilities, cumulative binomial probabilities, binomial distribution symmetry conditions, binomial distribution skewness conditions, binomial distribution parameters, mean and variance of a binomial distribution

NORMAL DISTRIBUTION

normal distribution, normal distribution parameters, mean, standard deviation, standard normal distribution, z-value, reliability, validity

Skills/Procedures

MATHEMATICAL EXPECTATION:

compute the mean, variance, and standard deviation of a discrete random variable

solve various applications problems involving discrete probability distributions

BINOMIAL DISTRIBUTION:

compute binomial probabilities and verify results with table in textbook

compute cumulative binomial probabilities

compute binomial probabilities with p = q and verify symmetry

solve various application problems using the binomial distribution

NORMAL DISTRIBUTION -- given appropriate data,

determine a normal probability (area), given x, μ, and σ

determine x, given μ, σ, and the normal probability (area)

determine μ, given x, σ, and the normal probability (area)

determine σ, given x, μ, and the normal probability (area)

solve various applications problems involving the normal distribution

compute the sampling standard deviation (standard error) from the population standard deviation

and the sample size

solve various applications problems involving the central limit theorem

Concepts

MATHEMATICAL EXPECTATION

give an example (other than water) of something that looks continuous at a distance, but, when you get up close, turns out to be discrete

explain why "expected value" may be a misleading name for the mean of a probability distribution

describe how to compute a weighted average

BINOMIAL DISTRIBUTION:

explain why rolling a die is or is not a binomial experiment

explain why drawing red/black cards from a deck of 52 without replacement is or is not a binomial experiment

explain why drawing red/black cards from a deck of 52 with replacement is or is not a binomial experiment

NORMAL DISTRIBUTION

describe conditions under which the normal distribution is symmetric

describe the kind of shift in the graph of a normal distribution caused by a change in the mean

describe the kind of shift in the graph of a normal distribution caused by a change in the standard

deviation

explain why, as the sample size increases, the distribution of sample means clusters more and more

closely around the population mean

Part IV

Sampling Distributions

Sampling distribution of the mean--the distribution of the means of many samples of the same size

drawn from the same population

Central Limit Theorem--three statements about the sampling distribution of sample means:

1. Sampling distribution of the means is normal in shape, regardless of the population distribution

shape when the sample size, n, is large. (When n is small, the population must be normal in

order for the sampling distribution of the mean to be normal.) ("Large" n is usually taken

to mean 30 or more.)

2. Sampling distribution of the means is centered at the true population mean.

3. Sampling distribution of the means has a standard deviation equal to σ / (n.

This quantity is called the sampling standard deviation or the standard error (of the mean).

(The full name is "standard deviation of the sampling distribution of the mean(s).”)

This quantity is represented by the symbol σx bar.

σx bar is less than σ because of the offsetting that occurs within the sample. The larger

the sample size n, the smaller the σx bar (standard error), because the larger the

n, the greater the amount of offsetting that can occur, and the sample means will

cluster more closely around the true population mean μ.

Sampling standard deviation (σx bar or standard error)--key value for inferential statistics

Two uses of the standard error

Computing the error factor in interval estimation

Computing the test statistic (zc or tc) in hypothesis testing

Terminology

normal distribution, normal distribution parameters, mean, standard deviation, standard normal distribution, z-value, reliability, validity, sampling distribution, central limit theorem (three parts), sampling standard deviation, standard error, offsetting, effect of the sample size on the sampling standard deviation (standard error).

Skills/Procedures--given appropriate data,

determine a normal probability (area), given x, μ, and σ

determine x, given μ, σ, and the normal probability (area)

determine μ, given x, σ, and the normal probability (area)

determine σ, given x, μ, and the normal probability (area)

solve various applications problems involving the normal distribution

compute the sampling standard deviation (standard error) from the population standard deviation

and the sample size

solve various applications problems involving the central limit theorem

Concepts--

describe conditions under which the normal distribution is symmetric

describe the kind of shift in the graph of a normal distribution caused by a change in the mean

describe the kind of shift in the graph of a normal distribution caused by a change in the standard

deviation

explain why, as the sample size increases, the distribution of sample means clusters more and more

closely around the population mean

Part V

Interval Estimation--Large Samples

Four Types of Problems

Means--one-group; two-group

Columns one and two of the four-column formula sheet

Proportions--one-group; two-group

Columns three and four of the four-column formula sheet

Confidence level (confidence coefficient)--the probability that a confidence interval will actually contain the population parameter being estimated (confidence interval is a range of values that is likely to contain the population parameter being estimated).

90%, 95%, and 99% are the most popular confidence levels, and correspond to z-values

of 1.645, 1.960, and 2.576, respectively.

Of these, 95% is the most popular, and is assumed unless another value is given.

Error (uncertainty) factors express precision, as in 40 ± 3.

Upper confidence limit--the point estimate plus the error factor, 43 in this example

Lower confidence limit--the point estimate minus the error factor, 37 in this example

Error factor is the product of the relevant z-value and the standard error: zt * σx bar.

Required sample sizes for desired precision may be computed.

Increased precision means a lower error factor.

Precision can be increased by increasing the sample size, n.

Increasing n lowers the standard error, since the standard error = σ / (n.

Taken to the extreme, every member of the population may be sampled, in which case

the error factor becomes zero--no uncertainty at all--and the population parameter is

determined exactly.

Economic considerations--the high cost of precision

The required increase in n is equal to the square of the desired increase in precision.

To double the precision--to cut the error factor in half--the sample size must be quadrupled.

Doubling the precision may thus quadruple the cost.

To triple the precision--to cut the error factor to 1/3 of its previous value, n must be multiplied by 9.

Hypothesis Testing--Large Samples

Four Types of Problems--Four-column formula sheet

Means--one-group; two-group

Proportions--one-group; two-group

Null (Ho) and alternate (Ha) hypotheses

Means, one-group

H0: μ = some value

Ha: μ ≠ that same value (two-sided test)

μ > that same value (one-sided test, high end, right side)

μ < that same value (one-sided test, low end, left side)

Means, two-group

H0: μ1 = μ2

Ha : μ1 ≠ μ2 (two-sided test)

μ1 > μ2 (one-sided test, high end, right side)

μ1 < μ2 (one-sided test, low end, left side)

Proportions, one-group

H0: π = some value

Ha : π ≠ that same value (two-sided test)

π > that same value (one-sided test, high end, right side)

π < that same value (one-sided test, low end, left side)

Proportions, two-group

H0: π1 = π2

Ha : π1 ≠ π2 (two-sided test)

π1 > π2 (one-sided test, high end, right side)

π1 < π2 (one-sided test, low end, left side)

Type I error

Erroneous rejection of a true H0

Probability of a Type I error is symbolized by α.

Type II error

Erroneous acceptance of a false H0

Probability of a Type II error is symbolized by β.

Selecting α--based on researcher’s attitude toward risk

α--the researcher's maximum tolerable risk of committing a type I error

0.10, 0.05, and 0.01 are the most commonly used.

Of these, 0.05 is the most common--known as "the normal scientific standard of proof."

Table-z (critical value); symbolized by zt; determined by the selected α value

α 2-sided z 1-sided z

0.10 1.645 1.282

0.05 1.960 1.645

0.01 2.576 2.326

Calculated-z (test statistic); symbolized by zc

Fraction--"signal-to-noise" ratio

Numerator ("signal")--strength of the evidence against H0

Denominator ("noise")--uncertainty factor for the numerator

Rejection criteria

Two-sided test: |zc| >= |zt|; also p = |zt|, AND zc and zt have the same sign; also p ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download