Introduction - Southern Oregon University - Computer science 9608 notes

| |

| |

|Lecture Outlines for Applied Business Statistics |

|Rene Leo E. Ordonez, Southern Oregon University |

|Summer 2010 |

| |

| |

| |

|Note: The problems below were based on the text Doing Statistics for Business with Excel, 2nd Edition, by Pelosi and Sandifer. |

|The problems in each section are courtesy of same text. |

| |

| |Page Number |

| | |

|Reference to Excel and Minitab (Statistical functions) |3 |

| | |

|Introduction |8 |

| | |

|Probability Distributions |10 |

| | |

|Sampling Distributions and Confidence Intervals |19 |

| | |

|Hypothesis Testing: An Introduction |39 |

| | |

|Inferences: One Population (Hypothesis Testing) |56 |

| | |

|Comparing Two Populations |68 |

| | |

|Improving and Managing Quality |83 |

| | |

|Experimental Design and Analysis of Variance (ANOVA) |89 |

| | |

|Analysis of Qualitative Data (Chi-square) |102 |

| | |

|Regression and Correlation |113 |

| | |

|Sample Midterm Exam |140 |

| | |

|Sample Final Exam |148 |

| | |

| | |MINITAB |

|Statistical Procedure |EXCEL 2007 | |

| | | |

|Descriptive Statistics |Data> Data Analysis > Descriptive Statistics |Stat > Basic Statistics > Display Descriptive Statistics |

|(mean, median, etc.) | | |

| | | |

|Confidence Interval Estimates | | |

|Mean |Data > Data Analysis > Descriptive Statistics |Stat > Basic Statistics > 1 Sample t (or 1 Sample z) |

|Proportion |NONE |Stat > Basic Statistics > 1 Proportion |

| | | |

|One Population Hypothesis Test | | |

|Mean |Data > Data Analysis > Descriptive Statistics |Stat > Basic Statistics > 1 Sample t (or 1 Sample z) |

|Proportion |NONE |Stat > Basic Statistics > 1 Proportion |

| | | |

|Two Populations Hypothesis Test |Data > Data Analysis > | |

|Means of 2 Dependent Samples |t-Test:Paired Two Sample for Means |Stat > Basic Statistics > Paired t |

|Means of 2 Independent Samples (small samples, equal vars.) |t-Test: Two Sample Assuming Equal Variances |Stat > Basic Statistics > 2 Sample t |

|Means of 2 Independent Samples (small samples, unequal vars) |t-Test: Two Sample Assuming Unequal Variances |Stat > Basic Statistics > 2 Sample t |

|Means of 2 Independent Samples (large samples) |z-Test: Two Sample for Means |Stat > Basic Statistics > 2 Sample z |

| | | |

|Variances of 2 Populations |F-Test Two-Sample for Variances |Stat > Basic Statistics > 2 Variances |

| | | |

|Proportion of 2 Populations |NONE |Stat > Basic Statistics > 2 Proportions |

| | | |

|Analysis of Variance |Data > Data Analysis > | |

|One Factor (use for comparing 2 or more population means) |Anova: Single Factor |Stat > ANOVA > Oneway Unstacked (or Stacked) |

|Two Factor With Replication |Anova: Two-Factor Wtih Replication |Stat > ANOVA > Twoway |

|Two Factor Without Replication |Anova: Two-Factor Wtihout Replication |Stat > ANOVA > Twoway |

|Interaction Effect Plot |NONE |Stat > ANOVA > Interactions Plot |

| | | |

|Chi-square Analysis | | |

|Goodness of Fit Test |NONE |NONE |

|Comparing Proportions of Two or More Groups |NONE |Stat > Tables > Cross Tabulation (for raw data) |

|Testing Independence of Two Nominal Variables |NONE |Stat > Tables > Chisquare Test (for summarized data) |

| | | |

|Comparing Variances of Two Populations |Data > Data Analysis > F-Test Two-Sample for Variances |Stat > Basic Statistics > 2 Variances |

| | | |

|Regression and Correlation Analysis |Data > Data Analysis > Regression |Stat > Regression |

| | |Fitted Line Plot |

| | |Regression |

| | |Residual Plots |

STATISTICAL PROCESSING USING MINITAB

A free 30-day trial copy of the full commercial version of Minitab can be downloaded from

Basic Statistics

Confidence Interval Estimation

Hypothesis Testing

[pic]

One-way and Two-way ANOVA

[pic]

Chisquare Tests

[pic]

Regression and Correlation Analysis

[pic]

STATISTICAL PROCESSING USING EXCEL

Data > Data Analysis

[pic]

[pic]

Important Note:

If you don’t see Data Analysis option under the DATA tab you have to add it in. Here are the steps:

1. Click on the Microsoft Icon (upper left corner), then select Excel Options

[pic]

2. Click Add Ins, then click Analysis Tool Pak VBA, then Go

[pic]

3. Select Analysis Toolpak and Analysis Toolpak VBA, then click OK

[pic]

Introduction

1. What is statistics? A science that deals with rules and procedures that govern how to:

collect( summarize( describe( interpret data

2. Why study statistics?

Decisions! Decisions! Decisions!

3. The Importance of understanding probability

Some 'real life' examples (it’s just a game!)

Monty Hall Dilemma

Suppose you're on a game show, and you're given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say number 1, and the host, who knows what's behind the doors, opens another door, say number 3, which has a goat. He says to you, "Do you want to pick door number 2?" Is it to your advantage to switch your choice of doors? (Craig. F. Whitaker,Columbia, MD )

Three Shell Game

Operator: Step right up, folks. See if you can guess which shell the pea is under. Double your money if you win.

After playing the game a while, Mr. Mark decided he couldn't win more than once out of three.

Operator: Don't leave, Mac. I'll give you a break. Pick any shell. I'll turn over an empty one. Then the pea has to be under one of the other two, so your chances of winning go way up.

4. Ways of assigning (determining) probabilities

Subjective - describes an individual's personal judgement about how likely a particular event is to occur. It is not based on any precise computation but is often a reasonable assessment by a knowledgeable person.

Relative -- Relative probability is another term for proportion; it is the value calculated by dividing the number of times an event occurs by the total number of times an experiment is carried out. [pic]

Objective (classical) – is probability based on symmetry of games of chance or similar situations. For example:

Coin tossing experiment ( P(head)

Die tossing experiment ( P(“one”)

Monty Hall Dilemma ( P(win)

5. Important statistical terms and concepts

(KNOW THESE DEFINITIONS AND SYMBOLS!)

• Population vs. sample

population -- any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about

sample -- a group of items selected from a population. Conclusions about the population are drawn by studying the sample.

• Parameter vs. statistics

parameter – a numeric characteristic of a population

statistic – a numeric characteristic of a sample. It is used to estimate and unknown population parameter

Parameters are often assigned Greek letters ( e.g. (, (, (), whereas statistics are assigned Roman letters (e.g. s, p).

• Common measures of central tendency

Mean, median, mode

• Common measures of dispersion

Range, variance, standard deviation

• Common symbols used in statistics

Important Symbols: Must Know

| |POPULATION |SAMPLE |

|Size |N |n |

|One Population | | |

| Mean |( |[pic] |

| Variance |(2 |s2 |

| Standard deviation |( |s |

| Proportion |( |p |

|Two Populations | | |

| Comparing Means |(1 vs (2 |[pic]1 vs [pic]2 |

| Comparing Proportions |(1 vs (2 |p1 vs p2 |

| Comparing Variances |(21 vs (22 |s21 vs s22 |

| Comparing Standard Deviations |(1 vs (2 |s1 vs s2 |

Probability Distributions

1. Definitions

• probability – likelihood or chance of an event occurring

• experiment -- any process or study which results in the collection of data, the outcome of which is unknown.

• random variable -- an outcome of an experiment. It need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we often want to represent outcomes as numbers. Usually denoted by the letter “X”

Example:

( toss a coin 5 times (experiment),

observe the number of heads (variable)

( Randomly select 20 students (sample),

record each student’s GPA (variable)

2. Random variables

• discrete (185) - usually involves counting (e.g. number of defectives, number of correct answers, etc.) If a random variable can take only a finite number of distinct values, then it must be discrete

( in the coin tossing experiment above, the random variable is “number of heads” x = {0, 1, 2, 3, 4, 5}

• continuous (186) – usually involves something that is measured A continuous random variable is one which takes an infinite number of possible values. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile

( in the student sampling above, the random variable is GPA x = {0 to 4.0}

3. Common Discrete Probability Distributions

• Uniform

• Binomial (191)

The trials must meet the following requirements:

a) the total number of trials (n) is fixed in advance;

b) there are just two outcomes of each trial; success and failure;

c) the outcomes of all the trials are statistically independent;

d) all the trials have the same probability of success

Example: coin tossing

• Hypergeometric (200)

a) each trial has just two outcomes; success and failure;

b) the outcomes of all the trials are statistically dependent;

c) the probability of success changes from trial to trial

• Poisson (203)-- typically, a Poisson random variable is a count of the number of events that occur in a certain time interval or spatial area. For example, the number of cars passing a fixed point in a 5 minute interval; the number of calls received by a switchboard during a given period of time

4. Common Continuous Probability Distributions

• Uniform (219)

[pic]

• Exponential (Not covered but will be introduced and covered in BA 380-Operations Management)

[pic]

• Normal (223 to 228)

5. The Normal Distribution (223 to 228)

• Characteristics

( bell-shaped

( mean = median = mode

( area underneath the curve equals 1

( symmetric about the mean (left side is mirror-image of right side)

( area left of mean = 0.50 = area right of mean

( asymptotic

6. How to find areas (probabilities) using the Normal table (229 to 234)

A MUST UNDERSTAND CONCEPT!

[pic]

| | | | | | | | |

|Standard Normal Distribution | | | | | | | |

| | |

| | |

|P ( Z < (1.0 ) |P ( (1.0 < Z < 1.0 ) |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|P ( 1.0 < Z < 2.5) |P ( Z > (2.65 ) |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

1) Given probabilities and their respective probability expressions, draw the normal distribution, shade the areas and find the corresponding z values:

Use Excel’s =NORMSINV(area) function or the Standard Normal Distribution to answer the problems below.

| | |

|P ( Z < z ) = 0.95 |P ( Z > z ) = 0.95 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|P ( Z < z ) = 0.25 |P ( Z > z ) = 0.25 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

Learning it! Exercises

The amount of money spent by students for textbooks in a semester is a normally distributed random variable with a mean of $235 and a standard deviation of $15

a) Sketch the normal distribution that describes the amount of money spent on textbooks in a semester.

b) What is the probability that a student spends between $220 and $250 in any semester?

c) What percentage of students spend more than $270 on textbooks in any semester?

d) What percentage of students spend less than $225 in a semester?

The actual amount of a certain brand of orange juice in a container marked half gallon is a normally distributed random variable with a mean of 65 oz. and a standard deviation of 0.35 oz.

a) What percentage of the containers contain more than 64.5 oz?

b) What percentage of the containers contain between 64 and 66 oz?

c) If federal law says that 98% of all the containers must be or above the labeled weight, does this brand of orange juice meet the requirement?

The size of a gift/specialty store in a regional super mall is a normally distributed random variable with a mean of 8,500 sq ft and a standard deviation of 260 sq ft. What is the probability that a randomly selected gift/specialty store in a regional super mall is:

a) more than 8000 sq ft?

b) between 8300 and 9000 sq ft?

c) less than 9,500 square feet

Sampling Distributions and

Confidence Intervals

1. The Distribution of the Sample Mean ([pic]) and the Central Limit Theorem (266 to 269)

Central Limit Theorem Definition (271):

When randomly sampling from a population, the distribution of the sample mean([pic]) is:

( approximately normal regardless of the original population distribution so as the sample is large (at least 30. But this sample size restriction is not required if the population is normal to begin with) with

( a mean [pic] equal to ( and

( a standard deviation [pic] equal to [pic]

2. Confidence Intervals

• Use: For estimating unknown population parameters

• Definition of confidence interval (290)

A probability that the interval contains the true population parameter

e.g. P ( U ≤ ( ≤ L ) = 1 ( (

• Components of confidence interval – point estimate and margin of error

a. Point estimate (290)

( A single number that is calculated from sample data

( Is used to estimate a population parameter

( e.g. sample mean is a point estimate for population mean, sample proportion is a point estimate of population proportion

| | | |

| |POPULATION |SAMPLE |

|Size |N |n |

|One Population | | |

|Mean |( |[pic] |

|Variance |(2 |s2 |

|Standard deviation |( |s |

|Proportion |( |p |

|Two Populations | | |

|Comparing Means |(1 vs (2 |[pic]1 vs [pic]2 |

|Comparing Proportions |(1 vs (2 |p1 vs p2 |

|Comparing Variances |(21 vs (22 |s21 vs s22 |

|Comparing Standard Deviations |(1 vs (2 |s1 vs s2 |

b. Margin of error (e)

( when added to and subtracted from the point estimator gives the upper and lower limit for the range of values where the population parameter could be found

( is affected (or determined) by:

■ confidence level

■ sample size

■ population variability

• Interpreting the confidence interval (294 - 295)

Say that you computed a 95% confidence interval estimate for the mean of a certain population as 3.2 and 3.5

( correct interpretation : “We are 95% confident that the interval 3.2 and 3.5 contains the true population mean”

( incorrect interpretation: “There is a 95% chance that the population mean is in the interval 3.2 and 3.5”

3. Computing a confidence interval for the population mean (()

• z-dist for Large samples or ( known (290-297)

• t-dist for small samples and ( unknown (298-304)

4. Computing a confidence interval for qualitative data (the population proportion (()) (305-310)

5. Sample Size Calculations (311-313)

• For estimating a population mean

• For estimating a population proportion

(Using the Normal distribution as an approximation to the Binomial distribution)

• Factors affecting sample size requirement

1) confidence level

2) variability of the population

3) acceptable level of margin of error

CONFIDENCE INTERVAL ESTIMATION

Confidence IntervaI for (

( Known

( Unknown

Confidence Interval for (

SAMPLE SIZE (n) DETERMINATION

For estimating (

For estimating (

STUDENT’S t-DISTRIBUTION TABLE

[pic]

Learning it! Exercises (by hand and using Excel)

[pic]

7.6

A manufacturer of pain reliever claims that it takes an average of 12.75 minutes for a person to be relieved of headache pain after taking its pain reliever. The time it takes to relief is normally distributed with a standard deviation of 0.5 minutes. A sample of 12 people is taken and the data are shown here:

|12.9 |

|1431 |1540 |1293 |1340 |

|1302 |1700 |1533 |1402 |

|1255 |1840 |1272 |1467 |

|1377 |1642 |1572 |1220 |

|1450 |1139 |1520 |1477 |

|1483 |1227 |1227 |1515 |

|1529 |1684 |1257 |1242 |

|1588 |1782 |1238 |1350 |

|1535 |1491 |1276 |1367 |

|1533 |1513 |1420 |1375 |

a. Find a 95% confidence interval for the average number of cars that pass this location on a daily basis. The standard deviation is assumed to be 165 cars.

b. The company has decided to open a store at this location only if there is a daily average of at least 1400 cars passing this location. Based on your confidence interval, would you advise the company to open a store at this location? Explain why or why not.

[pic]

7.20

The police department is concerned about the ability of officers to identify drunk drivers on the road. Before instituting a new training program they take a sample of 28 arrests and record the level of alcohol in the blood at the time of the arrest. Assume that the level of alcohol in the blood is normally distributed. The data are shown below:

|92 |93 |108 |173 |194 |133 |207 |

|127 |256 |184 |253 |159 |101 |133 |

|204 |182 |173 |105 |153 |150 |180 |

|209 |141 |151 |133 |147 |209 |252 |

a) Find a 90% confidence interval for the average alcohol level in the blood at the time of arrest

b) Find a 95% confidence interval for the average alcohol level in the blood at the time of arrest

Excel Solution

Data > Data Analysis > Descriptive Statistics

[pic]

[pic]

Minitab Solution

Stat > Basic Statistics > 1-Sample t > Options

[pic]

[pic]

7.21

A large amusement park has recently added 5 new rides, including a large roller coaster called Mind Eraser. Management is concerned about the waiting times on the new roller coaster. A random sample of 10 people is selected and the time (in minutes) that each person waits to ride the Mind Eraser is recorded and shown below:

|43 |80 |48 |61 |74 |

|66 |54 |72 |58 |68 |

a) Find a 95% confidence interval for the average waiting time for the Mind Eraser, assuming that the waiting time is normally distributed.

b) The park management thinks that if customers have to wait more than 60 minutes for a ride, then the park should increase the staff to reduce the waiting time. Based on your confidence interval, does the park need to increase the staff? Explain why or why not.

[pic]

7.25

I asked 100 imaginary friends (only to avoid the time and cost of data collection) the following question: “Do you regularly watch MTV’s Beavis and Butthead?” Of the 100 friends, 35 of them answered yes.

a) Calculate a 95% confidence interval for the viewership of this show.

b) MTV is considering canceling the show if less than one-third of the population regularly watches the show. Based on this information, what will MTV do?

Minitab Solution

Stat > Basic Statistics > 1-Proportion

[pic]

[pic]

(No Excel procedure)

[pic]

7.31

How many stores must be sampled for the woman who wants to buy a ranch to be 95% confident that the error in estimating the average fat content per pound in steaks sold in the Portland, Maine area is at most 0.05 oz? The standard deviation of fat content is known to be 0.30 oz.

7.32

How many months must be sample for analysts to be 99% confident that the error in estimating the average monthly price of peanut butter is at most $0.02? Assume the standard deviation is $0.035

7.42

In an effort to improve the quality of the CD players that your company makes, you have started to sample the component parts that you purchase from an outside supplier. You will accept the shipment of parts only if there is less than 1% defectives in the shipment. Recognizing that you cannot test the entire shipment (or population), you select a sample of 25 components to test. You find 3 defectives in the sample.

a) Find a 90 percent confidence interval for the proportion of components in the population that are defective.

b) Based on your confidence interval, should you accept the shipment? Why or why not?

7.44

A hotel is studying the proportions of rooms that are not ready when customers check in to the hotel.

(a) How many rooms must be in the sample for the hotel to be 95% confident that the margin of error is at most 1%?

c) How many rooms must be in the sample for the hotel to be 95% confident that the margin of error is at most 3%?

[pic]

Hypothesis Testing: An Introduction

1. What is a hypothesis test? (327)

( a hypothesis is an idea, an assumption, or a theory about the behavior of one or more variables in one or more populations

( a hypothesis test is a statistical procedure that involves formulating a hypothesis and using sample data (n) to decide on the validity of the hypothesis i.e. is the sample consistent with the hypothesis (in which case you believe the hypothesis) or whether the sample is inconsistent with the hypothesis (in which case you choose not to believe it or to reject it)

important!

in statistical testing, regardless of the specific hypothesis that you are testing, the basic procedure is the same! Your understanding of the concepts introduced in this chapter is crucial for the remaining chapters!

2. Steps in performing hypothesis test: (328-332)

Step 1: Set up the null and alternative hypotheses

Step 2: Identify the significance level (() for determining the critical value

Step 3: Identify the appropriate distribution

Step 4: Collect the sample data (for determining the computed value)

Step 5: Compare the computed value to the critical value (or the p-value to the significance level)

Step 6: Make a statistical conclusion (reject the null or fail to reject the null)

Step 7: Make a managerial conclusion (usually a statistical test is conducted to assist in a decision-making process)

3. Null vs. Alternative Hypotheses and decision rule (329)

Important things to remember about H0 and H1

( H0: null hypothesis and H1: alternate hypothesis

( H0 and H1 are mutually exclusive and collectively exhaustive

( H0 is always presumed to be true

( H1 has the burden of proof

( a random sample (n) is used to “reject H0” or to “fail to reject H0 “

(If we conclude 'do not reject H0', this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of H1; rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.

( equality is always part of H0 (e.g. “=” , “≥” , “≤”).

“≠” “” always part of H1

[pic]

Structure:

[pic]

4. Setting up the null and alternative hypotheses (Is the test two-tailed (non-directional) or one-tailed (directional)?) and establishing the Rejection Region (333)

( identify the parameter being tested ((, (, (2)

( determine how many populations are included in the test

( Is “the claim” the null hypothesis or the alternate hypothesis?

• In actual practice, the status quo is set up as H0

• If the claim is “boastful” the claim is set up as H1 (we apply the Missouri rule – “show me”). Remember, H1 has the burden of proof

• In problem solving, look for key words and convert them into symbols (see examples below)

Some Examples

| | | |

|Keywords |Inequality |Part of: |

| |Symbol | |

|Larger (or more) than |> |H1 |

|Smaller (or less) |< |H1 |

|No more than |( |H0 |

|At least |≥ |H0 |

|Has increased |> |H1 |

|Is there difference? |≠ |H1 |

|Has not changed |= |H0 |

|Has “improved”, “is better than”. | | |

|“is more effective” |See note below |H1 |

• The direction of the test involving claims that use the words “has improved”, “is better than”, and the like will depend upon the variable being measured.

• For instance, if the variable involves time for a certain medication to take effect, the words “better” “improve” or more effective” are translated as “” (greater than, i.e. higher test scores)

• The equality ((, ≥, =) is always part of the null hypothesis.

5. Two types of error in hypothesis testing: Type 1 (() vs.Type 2 (() (330-333)

Statistical definitions:

Type 1 (() – the probability of rejecting a TRUE H0

Type 2 (() – the probability of failing to reject (or “accepting”) a FALSE H0

| |True Condition |

|Statistical Conclusion |H0 TRUE |H0 FALSE |

|Reject H0 |Type 1 (() |Correct |

|Fail to reject H0 |Correct |Type 2 (() |

More on Type 1 ((): in addition to its definition as “the probability of rejecting a TRUE H0 it is also:

( known as the significance level of a test (or simply, the significance level)

( usually ranges between 0.01 and 0.10 (which level is ‘best’? see next subsection)

( used to generate the critical value for a test

( an area at the tail end of a distribution, and

( this area is known as the reject H0 region (or the rejection region)

( The critical value marks the boundary between the reject H0 and fail to reject H0 regions

[pic]

Which should be avoided - Type 1 or 2 error?

( For a given sample size (n), there is a trade-off between Type 1 and Type 2 errors, that is, decreasing one will increase the other

( To decrease both types at the same time, a larger sample size must be taken

( However, because of cost, time, and practicality of sampling concerns, oftentimes we need to choose between type 1 and type 2 errors.

( Which should we decrease? Depends on the cost associated with each type of error

EXAMPLES:

In each of the example below, the Type 1 and Type 2 errors are defined in non-statistical terms. Can you identify the ‘cost’ associated with each type of error? For instance, in criminal cases, the cost associated with a Type 1 error (that is, a jury convicting an innocent person) is the freedom, or worse yet, the life of the accused. Now compare this to the cost of a Type 2 error. As a society, which do we consider as worse?

Justice system - criminal and civil cases

H0: Innocent

H1: Guilty

| |True Condition |

|Statistical Conclusion |Innocent |Guilty |

|Reject H0 |Type 1 (() – conclude that|Correct |

|(Guilty) |accused is guilty when in | |

| |fact is innocent | |

|Fail to reject H0 |Correct |Type 2 (() – conclude that |

|(not Guilty) | |accused is not guilty when |

| | |in fact is |

Business - quality control situations – process monitoring

H0: Process is in control

H1: Process is not in control

| |True Condition |

|Statistical Conclusion |Process OK |Process Not OK |

|Reject H0 |Type 1 (() – conclude that |Correct |

|(process not OK) |process is not in control | |

| |when in fact is | |

|Fail to reject H0 (process |Correct |Type 2 (() – conclude that |

|OK) | |process is OK when in fact is |

| | |not |

Business - quality control situations- quality assurance

H0: Lot of shipment is good

H1: Lot of shipment is not good

| |True Condition |

|Statistical Conclusion |Good Lot |Not Good Lot |

|Reject H0 |Type 1 (() – conclude that |Correct |

|(shipment is not good) |lot is not good when in fact| |

| |is | |

| |(producer’s risk) | |

|Fail to reject H0 (shipment |Correct |Type 2 (() – conclude that |

|is good) | |shipment is good when in |

| | |fact is not |

| | |(consumer’s risk) |

6. P-values (339)

( The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true. (see example below)

(It is the probability of wrongly rejecting the null hypothesis if it is in fact true.

( When used as a decision rule in hypothesis testing, the p-value is compared to the significance level (α). If the r-value is smaller, the conclusion is to reject the null hypothesis (or, we say that the result “is significant.”

( Here’s a decision rule using the p-value as a decision rule – this applies to ALL forms of hypothesis tests!

[pic] Remember this very important rule!

( Important interpretation! Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding 'reject H0' or 'do not reject H0'.

P-value Example:

example:[pic]

[pic]

Hypothesis Major Concepts, Hypothesis One PopulationDetermining Appropriate Test,

Hypothesis Testing Major Concepts_Pvalues

TESTING A POPULATION MEAN (()

H0: ( = value H0: ( ( value H0: ( ( value

H1: ( ( value H1: ( < value H1: ( > value

Reject H0 if: Reject H0 if: Reject H0 if:

(Z( > Z(/2 Z < (Z( Z > Z(

(t( > t(/2, n (1 t < ( t(, n (1 t > t(, n (1

TESTING A POPULATION PROPORTION (()

H0: ( = value H0: ( ( value H0: ( ( value

H1: ( ( value H1: ( < value H1: ( > value

Reject H0 if: Reject H0 if: Reject H0 if:

(Z( > Z(/2 Z < (Z( Z > Z(

TESTING A POPULATION VARIANCE ((2)

H0: (2 = value H0: (2 ( value H0: (2 ( value

H1: (2 ( value H1: (2 < value H1: (2 > value

Reject H0 if: Reject H0 if: Reject H0 if:

(2 < (2 1-(/2 (2 < (2 1-( (2 > (2 (

(2 > (2 (/2

Learning it! Exercises

[pic] Setting up the Hypotheses and Determining Type I and II Levels

For items 8.7 to 8.27 below, do the following:

a) State the Null and Alternative hypotheses

b) State the consequence of a Type I error

c) State the consequence of a Type II error

d) Suggest a value for (, and justify your choice

5 Administrators at a small college are concerned that part-time evening students may not be familiar with all the services of the College. They wish to offer an orientation program to these students but recognize that most of the part-time students work during the day and are generally very busy. The administrators do not want to prepare an elaborate presentation if only a handful of part-time students will attend. Hence, they will conduct the orientation if more than 25% of the part-time students are interested in attending.

6 A company CEO is thinking about setting up an on-site day-care program for its employees. The CEO has stated that she will do so only if more than 80% of the employees favor such a decision. Set up the null and alternative hypotheses.

7 In an attempt to improve quality many manufacturers are developing partnerships with their suppliers. A local fast-food burger outfit has partnered with its supplier of potatoes. The burger outfit buys potatoes in bags that weigh 20 lbs. It wishes to set up the null and alternative hypotheses to test if the bags do weigh on average 20 lbs.

8 You are a connoiseur of chocolate chip cookies and you do not think that Nabisco’s claim that every bag of Chips Ahoy cookies has 1000 chocolate morsels is correct. Set up the null and alternative hypotheses to test this claim.

9 Antilock brake systems (ABS) have been hailed as a revolutionary safety feature. A study by the National Traffic Safety Administration looked at fatal accidents. The claim is that cars with ABS are in fewer fatal crashes than those without.

10 A college placement office wonders whether there is a difference between the average salary of engineering graduates and business school graduates.

11 Your new television has a 1-year warranty. You are given the option to buy a 3-year warranty and wonder if it is worth it. You wish to test the hypothesis that the average time before a problem occurs is more than 3 years

12 M&M/Mars claims that at least 20% of the M&M’s in each package are the new blue color

13 A computer center is arguing for more computers in the lab for students at a midsize college. The computer center at a university claims that the average amount of time that students spend on-line has increased from last year’s average of 1 hour per day.

14 It seems like you spend more money on groceries during the summer months when you eat more ice cream and drink more fluids. You know that you spend on average of $25 per week on groceries during the winter months. Set up the null and alternative hypotheses to decide if, on average, you spend more than this amount per week during the summer months.

15 M&M Mars claims that at least 20% of the M&M’s in each package are the new blue color. Set up the null and alternative hypotheses to test this claim.

17 The computer center at a university claims that the average amount of time that students spend on-line has increased from last year’s average of 1 hour per day. Set up the null and alternative hypotheses to test this claim.

(note: for the following problems use Excel’s Data Analysis Tools to generate the descriptive statistics to minimize hand calculation)

[pic]

Problem 8.1

The School Committee members of a midsize New England city agreed that a strict discipline code had caused an increase in the number of student suspensions. The number of suspensions for a sample of schools in this city for the periods September 1992 to February 1993 is shown below:

|CITY |Number of Suspensions |

|Central |245 |

|MCDI |1 |

|Chestnut |65 |

|Duggan |133 |

|Kennedy |97 |

|Forest Park |149 |

|Puttnam |1024 |

|Kiley |56 |

|Central Academy |254 |

|Commerce |114 |

|Bridge |7 |

The average number of suspensions for the previous year was 130.5 with a population standard deviation of 158.2

Set up the null and alternative hypotheses to test if the average number of suspensions has changed

Test your hypothesis using significance level of 0.05

Find the p-value

Display the data to see if it is reasonable to assume that the underlying population distribution is normal.

Based on the p-value, what can you conclude about the average number of suspensions.

Minitab Solution

Stat > Basic Statistics > 1-Sample Z

[pic]

Minitab Output

[pic]

[pic]

Problem 8.2

The Educational Testing Service (ETS) designs and administers the SAT exams. Recently the format of the exam changed and the claim has been made that the new exam can be completed in an average of 120 minutes. A sample of 50 new exam times yielded an average of 115 minutes. The standard deviation is assumed to be 2 minutes.

a) Set up the null and the alternative hypotheses to test if the average time to complete the exam is has changed from 120 minutes.

b) Test your hypothesis using significance level of 0.05

c) Find the p-value

d) Based on the p-value, what can you conclude about the average time to complete the new exam?

[pic]

Problem 8.28

A major manufacturer of glue products thinks it has found a way to make glue adhere longer than the current average of 90 days. The manufacturer wishes to see whether the glue products made this way have an average time to failure greater than 90 days. A sample of 30 tubes of new glue yield an average of 93 days before failing. The failure time is normally distributed with a standard deviation of 3 days.

a) Set up the null and the alternative hypotheses to test whether average time to failure is greater than 90 days.

b) Test your hypothesis using significance level of 0.05

c) Find the p-value

d) Based on the p-value, what can you conclude about the average time to failure for the new product?

Inferences: One Population (Hypothesis Testing)

1. Testing the Mean (()

• z-dist for Large samples or ( known (334)

• t-dist for small samples and ( unknown (341)

2. Testing the Population Variance((2) (Not covered)

• (2 distribution

3. Testing the Population Proportion(() (349)

• z-dist. (approx. to the Binomial dist.)

4. Hypothesis Testing using Minitab and Excel

[pic]

TESTING A POPULATION MEAN (()

H0: ( = value H0: ( ( value H0: ( ( value

H1: ( ( value H1: ( < value H1: ( > value

Reject H0 if: Reject H0 if: Reject H0 if:

(Z( > Z(/2 Z < (Z( Z > Z(

(t( > t(/2, n (1 t < ( t(, n (1 t > t(, n (1

TESTING A POPULATION PROPORTION (()

H0: ( = value H0: ( ( value H0: ( ( value

H1: ( ( value H1: ( < value H1: ( > value

Reject H0 if: Reject H0 if: Reject H0 if:

(Z( > Z(/2 Z < (Z( Z > Z(

TESTING A POPULATION VARIANCE ((2)

H0: (2 = value H0: (2 ( value H0: (2 ( value

H1: (2 ( value H1: (2 < value H1: (2 > value

Reject H0 if: Reject H0 if: Reject H0 if:

(2 < (2 1-(/2, n - 1 (2 < (2 1-(, n-1 (2 > (2 (, n - 1

(2 > (2 (/2, n - 1

Learning it! Exercises

Problem 9.1

The cost of common goods and service in 5 cities is shown in the table below (USA Today):

|City |Aspirin |Fast Food |Woman’s |Toothpaste |

| |(100) |(hamburger, |Haircut/Blow Dry |(6.4 oz) |

| | |fries, soft drink) | | |

|Los Angeles |$7.69 |$4.15 |$20.11 |$2.42 |

|Tokyo |$35.93 |$7.62 |$76.24 |$4.24 |

|London |$9.69 |$5.80 |$44.35 |$3.63 |

|Sydney |$7.43 |$4.53 |$29.93 |$2.08 |

|Mexico City |$1.16 |$3.63 |$17.94 |$1.08 |

a. You have just returned from a business trip and you lost your receipt for the aspirin you purchased but would like to be reimbursed by your company (since you had taken the aspirin after a stressful business meeting!). You guesstimate a cost of $10.00. Your boss claims that the average cost of aspirin is less than $10.00. Using these data, can you “prove” your boss wrong? Conduct the necessary hypothesis test. Assume that all costs are normally distributed.

b. Based on these data, is there enough evidence to support your submitting a cost of $10.00 for the fast-food meal on your trip?

c. If you remove Tokyo from the data set do your answers to parts (a) and (b) change? What does this tell you about the effect of outliers on the hypothesis test of µ when you have a small sample?

[pic]

Problem 9.2

The marketing material for a New England Ski resort advertises that they can make snow whenever the temperature is 32°F or below. To demonstrate how often this happens their material includes the following line graph of the weekly average temperatures (See graph in text).

The data that generated the graph are shown below:

|Week |1 |

|All Energy Marketing Co |478.66 |

|Broad Street/ Energy One |450.24 |

|Global Energy Services |468.16 |

|Green Mountain Energy Partners |471.24 |

|KBC Energy Services |435.53 |

|Louis Dreyfus Energy Services |472.24 |

|National Fuel Resources |468.22 |

|NorAm Energy |442.20 |

|WEPCO Gas |443.52 |

|Western Gas Resources |457.81 |

[pic]

Problem 9.5

Computer centers at universities and colleges are certainly aware of the increased number of Web surfers. To begin to understand the demands that will be made on the computer center resources, one school studied 25 children in grades 7 to 12. The number of hours that these children spend on the Internet in 1 week is shown here:

|5.0 |4.4 |5.7 |5.6 |5.5 |5.2 |5.0 |4.8 |3.6 |4.1 |

|4.6 |4.9 |4.0 |6.7 |5.5 |5.4 |6.7 |5.8 |5.4 |4.8 |

|5.9 |5.1 |3.8 |4.1 |6.7 | | | | | |

Is there enough evidence to indicate that children spend more than a average of 5 hours per week Web surfing? Assume that the time spent Web surfing is normally distributed.

9.6

A company that sells mail-order computer systems has been planning inventory and staffing based on an assumption that the variance of their weekly sales is 180 ($10002). The weekly sales are normally distributed. The company selects 15 weeks at random from the past year and obtains the data (in thousands of dollars) shown below:

Weekly Sales: 191 222 222 223 223 225 227 228 229 232 234 234 236 244 253

a) What is the sample variance for these data?

b) Set up the hypotheses to test whether the population variance is different from 180.

c) At the 0.05 level of significance, what can you conclude about the company’s assumption?

9.7

In manufacturing, the amount of material that is wasted or lost during a process is very important. In preparing financial estimates, a company assumes that the percent material lost for its new process has a variance 10%2 . After the new process has been running for a month and appears to be stable, the cost analyst looks at the percent material lost and finds the following data:

Daily Loss: 10 12 12 13 14 14 18 19 19 20

a) What is the sample variance for these data?

b) Set up the hypotheses to test whether the actual variance is greater than the value the company has been assuming. Assume that the daily loss is normally distributed.

c) At the 0.05 level of significance, what can you conclude?

[pic]

9.10

Companies are increasingly concerned about employees playing video games at work. In addition to reducing productivity, this habit shows down networks and uses valuable storage space. A recent article stated at 80% of all employees play video games at work at least once a week. A large company that employs many engineers wonders if its employees are as bad as the article claims. If they are, the company will install software that detects and removes video games from the network. The company surveys (anonymously) 100 employees and finds that 85 of the employees surveyed have played video games at work in the past week.

a. Set up the null and alternative hypotheses to test whether the proportion of the company’s employees that play video games is greater than the proportion stated in the article.

b. At the 0.05 level of significance, test the hypotheses.

c. What do you recommend that the company do?

[pic]

9.11

An alumni office is interested in serving their alumni better in order to encourage more donations to the college. A survey of 200 alumni was conducted to determine whether half-day training sessions offered on the campus were of interest. If more than 75% of the alumni were interested, the college would start a program. The survey showed that 160 of the alumni surveyed were interested in such a program.

a. Set up the null and alternative hypotheses to test whether the college should implement the program.

b. At the 0.05 level of significance, test the hypotheses.

c. What do you recommend that the college do?

[pic]

9.12

A company that makes computer keyboards has specifications that allow it produce a product that has a maximum of 3% defective. The company has been receiving more customer complaints than usual. A sample of 50 keyboards has 2 defectives.

a. Set up the null and alternative hypotheses to test whether the proportion defective keyboards has exceeded the amount allowed by the specifications.

b. At the 0.05 level of significance, test the hypotheses.

c. What do you recommend that the company do?

Comparing Two Populations

1. Comparing Means of Two Populations ((1 vs. (2)

• Dependent vs. Independent Samples

• Comparing Means using Two Independent Samples

- Large samples (z-distribution) (365)

- Small samples (t-distribution) (375)

• Comparing Means using Two Dependent Samples (384)

- t-distribution

2. Comparing Proportions of Two Populations ((1 vs. (2) (371)

• Using the z-distribution as approximation to the Binomial

3. Comparing Variances of Two Populations ((21 vs. (22) (404)

• Using the F-distribution

[pic] are available

COMPARING

TWO POPULATION MEANS ((1 vs. (2)

H0: (1 = (2 H0: (1 ( (2 H0: (1 ( (2

H1: (1 ( (2 H1: (1 < (2 H1: (1 > (2

Reject H0 if: Reject H0 if: Reject H0 if:

(Z( > Z(/2 Z < (Z( Z > Z(

(t( > t(/2, n1+ n2 (2 t < ( t(, n1+ n2 (2 t > t(, n1+ n2 (2

COMPARING

TWO POPULATION PROPORTIONS ((1 vs. (2)

H0: (1 = (2 H0: (1 ( (2 H0: (1 ( (2

H1: (1 ( (2 H1: (1 < (2 H1: (1 > (2

Reject H0 if: Reject H0 if: Reject H0 if:

(Z( > Z(/2 Z < (Z( Z > Z(

COMPARING

TWO POPULATION VARIANCES ((21 vs. (22)

H0: (21 = (22

H1: (21 ( (22

Reject H0 if:

F > F((/2,v1,v2)

Learning it! Exercises

[pic]

10.1

Many studies have been done comparing consumer behavior of men and women. One such on-going study concerns take-out food. In particular, the study focuses on whether there is a difference in the mean number of times per month that men and women buy take-out food for dinner. The most recent results of the study are shown below:

| |Population |

| |Men |Women |

|Sample Size |34 |28 |

|Sample Mean |25.6 |21.2 |

|Population Standard Deviation |4.2 |3.7 |

Because the study has so much historical data, information is known about the population standard deviations.

a. Set up the hypotheses to test whether there is a difference in the mean number of times per month that a person buys take-out food for dinner for men and women.

b. Use the Z test with known population variances to set up and perform the test. Use a level of significance of 0.05.

c. Find the p value for the test.

d. Do the data provide evidence that the mean number of times per month for men differs from that for women?

e. Does the choice of α in this case affect the decision?

10.2

Professional employees who work for large corporations often contend that the mean salary paid by a company differs by location in the United States. To test that claim, data were collected on financial analysts working for a large corporation at locations in New England and in the upper Midwest. Because there is an extensive history of salary data, the population standard deviations are available. The study found the following results:

| |Population |

| |New England |Upper Midwest |

|Sample Size |25 |20 |

|Sample Mean |22.3 |18.5 |

|Population Standard Deviation |1.5 |2.2 |

a) Set up the appropriate hypotheses to test whether the company’s analysts in New England were paid more, on the average, than those working in the upper Midwest.

b) Use the Z test with known population variances to set up and perform the test. Use a level of significance of 0.05.

c) Find the p value for the test.

d) Do the data support the contention that the mean pay for analysts in New England is higher than that of analysts in the upper Midwest?

[pic]

10.11

Having learned about the paired t test you realize that you really should have used the test for the data on software price comparison. The data are repeated below:

|Top Ten Business |Computability |PC Connection |

|Software Packages |Price ($) |Price ($) |

|Windows 95 Upgrade |88 |95 |

|Norton Anti-Virus |59 |70 |

|McAfee ViruScan |49 |60 |

|First Aid 97 Deluxe |54 |58 |

|Clean Sweep III |37 |37 |

|Norton Utilities |68 |75 |

|Netscape Navigator |45 |40 |

|MS Office Pro 97 Upgrade |300 |310 |

|First Aid 97 |32 |35 |

|Win Fax Pro |95 |95 |

a) Calculate the differences between the prices for each type of software package. Just looking at the differences, do you think that one company charges more than the other? Why or why not?

b) Calculate the average difference and the standard deviation of the differences.

c) Set up the hypotheses to test whether the mean difference in price between the two companies is zero.

d) Assuming that the data are normally distributed, at the 0.05 level of significance, is there a difference in the mean price of software for the two companies?

e) Did these results differ from the last time you analyzed the data? Why do you think this happened?

Minitab Solution

Stat > Basic Statistics > Paired t

[pic]

[pic]

Excel Solution

Data > Data Analysis > t-test: Pair Two Sample for Means

[pic] [pic]

[pic]

10.12

A hospital administrator is concerned about the length of time that the nursing staff washes their hands. A recent study in health care showed that longer washing greatly reduces the spread of germs. The hospital observed the amount of time that a sample of nine nurses in the Cardiac Care Unit (CCU) washed their hands. The data were colleted in such a way that the employees did not know that they were being observed. The hospital then showed the nurses an educational video on the negative effects of shortened time spent hand washing. After the video, the hospital again watched and timed the group of nurses washing their hands. The data are shown below:

|Observation |Unit |Time 1 (s) |Time 2 (s) |

|1 |CCU |3 |16 |

|2 |CCU |2 |7 |

|3 |CCU |0 |5 |

|4 |CCU |5 |8 |

|5 |CCU |2 |15 |

|6 |CCU |0 |15 |

|7 |CCU |2 |20 |

|8 |CCU |3 |16 |

|9 |CCU |0 |18 |

a) Calculate the differences between the times for each nurse. Just looking at the differences, do you think that, on the average, they washed their hands longer the second time? Why or why not?

b) Calculate the average difference and the standard deviation of the differences.

c) Set up the hypotheses to test whether there was an increase in the average amount of time spent washing hands.

d) Assuming that the data are normally distributed, at the 0.05 level of significance, what can you conclude?

e) Can you conclude that the video caused the nurses to wash their hands longer? Why or why not?

[pic]

Problem 10.7

Women who smoke suffer an increased risk of dying of breast cancer, according to a recently published study. In the study about, out of 319,000 women who never smoked there were 468 deaths from breast cancer, whereas out of 120,000 smokers, there were 187 deaths.

a) Calculate the sample proportion of women who died of breast cancer for smokers and non-smokers.

b) Ste up the hypotheses to test whether the proportion of women who die of breast cancer is higher for smokers than non-smokers.

c) At the 0.05 significance level, can you conclude that smoking causes breast cancer? If not, what can you conclude?

Problem 10.18

Selling personal computers is big business and consumers are becoming increasingly aware of vendor reputation. A recent study of two vendors of desktop personal computers reports on the units that need repair for Dell Computers and Gateway 2000. Of 1584 computers manufactured by Dell Computer 427 needed repair, whereas for Gateway 2000, 825 of 2662 computers needed repair.

a) Calculate the sample proportion of computers needing repair for each company.

b) Set up the hypotheses to test whether the proportion of computers needing repairs is different for the two companies.

c) At the 0.05 level of significance, what can you conclude?

Minitab Solution

[pic]

[pic]

(No Excel Solution)

10.20

Consider the problem in which the Board of Realtors for Greater Bridgeport, CT, was looking at the average selling prices of homes. The data are given again below:

| |Population |

| |1995 |1996 |

|Sample Size |25 |25 |

|Sample Mean |$151,116 |$160,669 |

|Sample Standard Deviation |$5,332 |$6,468 |

a) Assuming that the populations are normally distributed, set up the hypotheses to test whether the population variances are equal at the 0.10 level of significance.

b) Was the decision to test using the pooled variance justified?

10.21

In your quest for the perfect golf clubs you made an assumption about the population variances when you tested your hypotheses. The data you collected are given below:

| |Population |

| |Brand X |Brand Z |

|Sample Size |15 |15 |

|Sample Mean |255 |271 |

|Sample Standard Deviation |8.7 |9.1 |

a) Set up the appropriate hypotheses to test whether the variance of Brand Z clubs is the same as the variance for Brand X.

b) Assuming that the populations are normally distributed, at the 0.10 level of significance was your decision to pool the variances a good one?

c) In general, would a difference in variation between the clubs be a factor in your purchase decision?

Procedures for Testing Independence

Experimental Design and ANOVA (Analysis of Variance)

1. Definition of terms

o Factor and response variable (428)

o ANOVA and treatment (410)

2. Sources of Variance (411)

• Treatment or Between Groups Variation (a.k.a. explained, factor, treatment)

• Random or Within Groups Variation (a.k.a. unexplained, random, error)

3. One-way ANOVA (410)

• Review of variables

- quantitative vs. qualitative

- dependent vs. independent

• Using ANOVA as procedure for comparing means of two or more groups

• Using ANOVA as procedure for determining whether a qualitative independent variable and quantitative dependent variable are related

4. Two-Way ANOVA with Replication – a.k.a. Two-way ANOVA with Interaction (427)

• Using ANOVA as procedure for comparing means of two or more groups (Factor A and Factor B)

• Using ANOVA as procedure for determining whether a qualitative independent variable and quantitative dependent variable are related (Factor A and Factor B)

• Testing the presence of interaction between Factors A and B

ANALYSIS OF VARIANCE (ANOVA)

ONEWAY ANOVA

H0: (1 = (2 =(3 = … = (t

H1: The population means are not all the same

Reject H0 if: F > F(,v1,v2

Where: v1 = (t-1)

v2 = (N - t)

TWOWAY ANOVA(with replication)

1. Testing for Main Effects (Factor A)

H0: (1 = (2 =(3 = … = (t (No level of factor A has an effect)

H1: The population means are not all the same (at least 1 level has an effect)

Reject H0 if: MSA/MSE > F(,v1,v2

Where: v1 = (a -1)

v2 = ab(r -1)

2. Testing for Main Effects (Factor B)

H0: (1 = (2 =(3 = … = (t (No level of factor B has an effect)

H1: The population means are not all the same (at least 1 level has an effect)

Reject H0 if: MSB/MSE > F(,v1,v2

Where: v1 = (b -1)

v2 = ab(r -1)

3. Testing for INTERACTION EFFECTS (AB)

H0: There are NO interaction effects

H1: At least 1 combination of factor A and B levels has an effect

Reject H0 if: MSAB/MSE > F(,v1,v2

Where: v1 = (a -1)(b - 1)

v2 = ab(r -1)

Learning it! Exercises

[pic]14.1

A diaper company is considering 3 different filler materials for their disposable diapers. Eight diapers were tested with each of the 3 filler materials, and 24 toddlers were randomly given a diaper to wear. As the child played, fluid was injected into the diaper every 10 minutes until the product failed (leaked). The amount of fluid (in grams) at the time of failure was recorded for each diaper. The data are shown below:

|Material 1 |Material 2 |Material 3 |

|791 |809 |828 |

|789 |818 |814 |

|796 |803 |855 |

|802 |781 |844 |

|810 |813 |847 |

|790 |808 |848 |

|800 |805 |836 |

|790 |811 |873 |

a) What is the response variable and what is the factor?

b) How many levels of the factor are being studied?

c) Is there any difference in the average amount of fluid the diaper can hold using the three different filler materials? If so, which ones are different?

d) What is your recommendation to the company and why?

MINITAB OUTPUT

Stat > ANOVA > One-way (or One-way (Unstacked))

Results for: Problem 14_1.MTW

One-way ANOVA: Grams versus Material

Analysis of Variance for Grams

Source DF SS MS F P

Material 2 9808 4904 29.83 0.000

Error 21 3452 164

Total 23 13260

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev -------+---------+---------+---------

1 8 796.00 7.50 (----*----)

2 8 806.00 11.12 (----*----)

3 8 843.00 17.70 (----*---)

-------+---------+---------+---------

Pooled StDev = 12.82 800 820 840

EXCEL OUTPUT

Tools > Data Analysis > Oneway ANOVA

|Anova: Single Factor | | | | | | |

| | | | | | | |

|SUMMARY | | | | | | |

|Groups |Count |Sum |Average |Variance | | |

|Mat1 |8 |6368 |796 |56.28571 | | |

|Mat2 |8 |6448 |806 |123.7143 | | |

|Mat3 |8 |6745 |843.125 |314.4107 | | |

| | | | | | | |

| | | | | | | |

|ANOVA | | | | | | |

|Source of Variation |SS |df |MS |F |P-value |F crit |

|Between Groups |9864.083 |2 |4932.042 |29.92679 |0.000000711968 |3.466795 |

|Within Groups |3460.875 |21 |164.8036 | | | |

| | | | | | | |

|Total |13324.96 |23 | | | | |

| | | | | | | |

14.3

Grading homework is a real problem. It takes an enormous amount of time and many students do not do a very good job or copy answers from other students or the back of the book. A teacher of elementary statistics decided to conduct a study to determine what effect grading homework had on her students’ exam scores. She taught 3 sections of Elementary Statistics and randomly assigned each class one of three conditions: (1) no homework given, (2) homework given, but not collected, and (3) homework give, collected, and graded. After the first exam, she collected the data (exam scores). They are shown in the Excel data file Homework.xls

a) What is the response variable and what is the factor?

b) How many levels of the factor are being studied?

c) Is there any difference in the average amount of fluid the diaper can hold using the three different filler materials? If so, which ones are different?

d) What is your recommendation to the company and why?

MINITAB OUTPUT:

Results for: Problem 14_3.MTW

One-way ANOVA: C2 versus C1

Analysis of Variance for C2

Source DF SS MS F P

C1 2 1700.4 850.2 8.91 0.001

Error 45 4295.4 95.5

Total 47 5995.8

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev -------+---------+---------+---------

1 16 74.500 11.051 (------*------)

2 16 70.313 9.016 (------*------)

3 16 84.500 9.107 (------*------)

-------+---------+---------+---------

Pooled StDev = 9.770 70.0 77.0 84.0

[pic]

14.16

The manufacturer of batteries is designing a battery to be used in a device that will be subjected to extremes in temperature. The company has a choice of 3 materials to use in the manufacturing process. An experiment is designed to study the life of the battery when it is made from materials A, B, C and is exposed to temperatures of 15, 70, and 125 degree Fahrenheit. For each combination of material and temperature, 4 batteries are tested. The lifetimes in hours of the batteries are shown below:

| |Temperature |

| |15(F |70(F |125(F |

|Material A |130 |34 |20 |

| |155 |40 |70 |

| |74 |80 |82 |

| |180 |75 |58 |

|Material B |150 |126 |25 |

| |188 |122 |70 |

| |159 |106 |58 |

| |126 |115 |45 |

|Material C |138 |174 |96 |

| |110 |120 |104 |

| |168 |150 |82 |

| |160 |139 |83 |

a) Calculate the average life for each of the material types.

b) Calculate the average life for each of the 3 temperatures.

c) Calculate the average life for each of the 9 treatment groups.

d) Plot the 9 treatment means on a graph with temperature factor on the x axis, and the life of the battery in hours on the y axis. Use different color for each of the 3 materials and connect the averages for those of the same material. What do you speculate about the interaction effect based on the graph?

e) Confirm your suspicions by doing a two-way ANOVA and testing to see if there is a significant interaction effect.

f) What materials do you recommend to this manufacturer and why?

MINITAB OUTPUT

Stat > ANOVA > Two-way

[pic]

[pic]

Interaction Plot

Stat > ANOVA > Interaction Plot

[pic]

[pic]

Excel Solution

Data >Data Analysis > Two-way ANOVA (with Replication)

[pic]

[pic]

[pic]14.20

A manufacturer of adhesive products designed an experiment to compare a new adhesive product to a competitor’s product. The adhesive product, or glue, is used by automobile manufacturers. The response variable was the strength of the glue measured by tensile strength in pounds per square inch (psi). The ability to adhere to oil-contaminated surfaces under different humidity conditions was studied. There were 2 levels for factor A: no oil or oil. Oil contamination was applied by hand dipping the samples in an oil solution and allowing them to air dry at room temperature for 2 hours. There were two levels for factor B: 50% humidity and 90% humidity. Three samples were tested for each of the combinations of factor A and factor B. The tensile values (psi) for the new product are shown below:

|Humidity |No Oil |Oil |

|50% |175 |43 |

| |100 |42 |

| |175 |44 |

|90% |95 |95 |

| |115 |105 |

| |85 |116 |

a) Does the product behave significantly differently if the surface is oil contaminated?

b) Does the product behave significantly differently at different humidity levels?

c) Is there any significant interaction effect present?

Two-way ANOVA: PSI versus Surface, Humidity

Analysis of Variance for PSI

Source DF SS MS F P

Surface 1 7500 7500 13.52 0.006

Humidity 1 85 85 0.15 0.705

Interaction 1 9747 9747 17.56 0.003

Error 8 4439 555

Total 11 21772

Individual 95% CI

Surface Mean ----------+---------+---------+---------+-

No oil 124.2 (--------*--------)

Oil 74.2 (--------*--------)

----------+---------+---------+---------+-

75.0 100.0 125.0 150.0

Individual 95% CI

Humidity Mean ---------+---------+---------+---------+--

50% 96.5 (-----------------*------------------)

90% 101.8 (------------------*-----------------)

---------+---------+---------+---------+--

84.0 96.0 108.0 120.0

[pic]

Analysis of Qualitative Data (Chi-square)

1. The Chi-square test Explained (640)

2. Four Uses of the Chi-square Distribution

a) Testing for goodness-of-fit (641)

( is a test for comparing a theoretical distribution, such as a Normal, Poisson etc, with the observed data from a sample

( answers the question: “does the sample come from a specified distribution?”

b) Testing (comparing) proportions of two or more groups (651)

c) Testing whether two categorical (a.k.a. nominal, qualitative, classification) variables are independent (651)

d) Testing the variance of a population (covered in earlier chapter)

[pic] Chi-square Concepts and Solved Problems are Available

CHI-SQUARE ((2) DISTRIBUTION

Goodness-of-fit Test

H0: The sample comes from a specified distribution

H1: The sample does not come from a specified distribution

Reject H0 if: (2 > (2 (,(k - 1)

Test of Independence of 2 Categorical Variables

(Also used for comparing proportions of 2 or more groups)

H0: Variables 1 and 2 are not dependent

H1: Variables 1 and 2 are dependent

Reject H0 if: (2 > (2 (,(r - 1)(c - 1)

Testing A Population Variance ((2)

H0: (2 = value H0: (2 ( value H0: (2 ( value

H1: (2 ( value H1: (2 < value H1: (2 >value

Reject H0 if: Reject H0 if: Reject H0 if:

(2 > (2 (/2,n-1 (2 < (2 1-(,n-1 (2 > (2 (,n-1

(2 < (2 1((/2,n-1

Learning it! Exercises

[pic]

15.1

The administration of a university has been using the following distribution to classify ages of their students:

|Age |Estimated % of |

|Group |Student Population |

|Less than 18 |2.7 |

|18 – 19 |29.9 |

|20 – 24 |53.4 |

|Older than 24 |14 |

A recent student survey provided the following data on age of students:

|Age | |

|Group |Frequency |

|Less than 18 |6 |

|18 – 19 |118 |

|20 – 24 |102 |

|Older than 24 |26 |

Set up a table that compares the expected and observed frequencies for each group.

Based on the table, do you think that the data represent the established distribution?

Set up the hypothesis for the Chi-square goodness of fit test.

Perform the goodness of fit test at the 0.05 significance level.

Based on the chi-square test, is the estimated age distribution that the university is correct?

[pic]

15.2

As part of a survey on the use of Office Suites Software, the company doing the polling wanted to know whether its population was uniformly distributed over the following age distribution: under 25, 25 to 4, 44 and up. The company looked at the data it had collected so far had found the following distribution:

|Age |Number of Respondents |

|Group | |

|Under 25 |73 |

|25 to 44 |61 |

|45 and up |66 |

| |200 |

Based on the data, do you think that the respondents are uniformly distributed over the age categories?

Set up the hypothesis to test whether the data are uniformly distributed over the age categories.

Find the expected frequency distribution and perform the chi-square goodness of fit test.

At the 0.05 level of significance, would you say that the respondents were uniformly distributed over the age groups?

[pic] 15.6

In an experiment to study the attitude of voters concerning term limitations in Congress, voters in Indiana, Ohio, and Kentucky were polled with the following results:

|Opinion |Indiana |Kentucky |Ohio |

|Support |82 |107 |93 |

|Do Not Support |97 |66 |74 |

a) Set up the hypothesis to test whether the proportion of voters who support congressional term limits is the same for all three states.

b) Calculate the proportion of voters that support congressional term limits for each state individually. Based on these values, do you think there is a difference in the proportions?

c) Calculate the overall proportion of voters who support term limits for Congress.

d) Calculate the expected frequencies for each cell and find the value of the chi-square test statistic.

e) At the 0.05 level of significance, is there a difference in the proportion of voters who support congressional terms limits among the three states?

Minitab

Stat > Table > Chi-square Test

[pic]

[pic]

(No Excel Solution)

15.7

In a survey about satisfaction with local phone service, those respondents who rated their current service as excellent and those who rated Poor to Very Poor were asked to classify their current local service provider. The results are given in the table below:

| |Type of Company |

|Current Service |Long |Local | |Cable |Cellular Phone |

|Source |Distance |Phone |Power |TV | |

|Excellent |264 |444 |131 |215 |198 |

|Poor – Very Poor |1394 |1318 |485 |431 |572 |

a) Set up the hypothesis to test whether the proportion of people who rated their company as excellent is the same for each type of company.

b) Calculate the proportion of people who rate their current phone service as excellent.

c) Calculate the expected frequencies for each cell and find the value of the chi-square test statistic.

d) If you wanted to perform the test at the 0.05 significance level, what would be the critical value of the test?

e) At the 0.05 level of significance, is there a difference in the proportion of people who rate their local phone service as excellent among the different types of companies?

Chi-Square Test: Long Distance, Local Phone, Power, CableTV, Cellular Phone

Expected counts are printed below observed counts

Long Dis Local Ph Power CableTV Cellular Total

1 264 444 131 215 198 1252

380.74 404.63 141.46 148.35 176.82

2 1394 1318 485 431 572 4200

1277.26 1357.37 474.54 497.65 593.18

Total 1658 1762 616 646 770 5452

Chi-Sq = 35.796 + 3.831 + 0.773 + 29.947 + 2.536 +

10.671 + 1.142 + 0.230 + 8.927 + 0.756 = 94.610

DF = 4, P-Value = 0.000

[pic] 15.10

A report by the Department of Justice on rape victims reports on interviews with 3721 victims. The attacks ere classified by age of the victim and the relationship of the victim to the rapist. The results of the study are given here:

| |Relationship of Rapist |

|Age of | |Acquaintance or Friend | |

|Victim |Family | |Stranger |

|Under 12 |153 |167 |13 |

|12 to 17 |230 |746 |172 |

|Over 17 |269 |1232 |739 |

a) Set up the hypotheses to test whether age of victim and relationship of rapist are independent.

b) Calculate the expected frequencies for each cell.

c) How many degrees of freedom will the chi-square test for independence have? Using a level of significance of 0.01, what is the critical value for the test?

d) Calculate the value of the chi-square test statistic.

e) Is the age of the victim independent of the relationship to the rapist?

MINITAB

Stat > Table > Chisquare Test

Chi-Square Test: C1, C2, C3

Expected counts are printed below observed counts

C1 C2 C3 Total

1 153 167 13 333

58.35 191.96 82.69

2 230 746 172 1148

201.15 661.77 285.07

3 269 1232 739 2240

392.50 1291.27 556.24

Total 652 2145 924 3721

Chi-Sq =153.539 + 3.246 + 58.734 +

4.136 + 10.720 + 44.849 +

38.857 + 2.720 + 60.050 = 376.852

DF = 4, P-Value = 0.000

[pic]

15.11

A company that manufactures cardboard boxes for packaging cereals wants to determine whether type of defect that a particular box has is related to the shift on which it was produced. It compiles the following data. In each case, if a box had multiple defects the most serious defect was recorded.

| |Type of Defect |

|Shift |Printing |Rips/Tears |Size |

|1 |55 |60 |85 |

|2 |58 |63 |79 |

|3 |89 |63 |48 |

a) Set up the appropriate hypotheses for the test.

b) Calculate the expected frequencies for each cell and calculate the value of the chi-square test statistic.

c) How many degrees of freedom will the chi-square test for independence have?

d) Using a level of significance of 0.01, are defect type and shift related?

Chi-Square Test: Printing, Rips/Tears, Size

Expected counts are printed below observed counts

Printing Rips/Tea Size Total

1 55 60 85 200

67.33 62.00 70.67

2 58 63 79 200

67.33 62.00 70.67

3 89 63 48 200

67.33 62.00 70.67

Total 202 186 212 600

Chi-Sq = 2.259 + 0.065 + 2.907 +

1.294 + 0.016 + 0.983 +

6.972 + 0.016 + 7.270 = 21.782

DF = 4, P-Value = 0.000

Simple Linear Regression and Correlation

(454-491)

Important Definition of Terms

Test of Independence

Variables

Quantitative (measured)

Qualitative (category, classification, nominal)

Scatter plot

Regression and correlation

Linear vs. Curvilinear models

Simple vs. Multiple Linear Models

Correlation coefficient

Coefficient of determination

Residual (error) term

Observed y vs. expected y

Important Symbols

Y

X

Sy/x

R2

R

a

b

Problems for Simple Linear Regression:

[pic] 11.2 (p. 553)

Problems for Multiple Linear Regression:

[pic] Problem 12.1

[pic] Problem 12.5

Problem 12.9

Steps in Regression/Correlation Analysis

1. Identify the response (y) and

candidate predictor variables (x’s)

2. Collect y,x set of data

3. Plot each x versus y

4. From the plots in #3, select the most promising x

5. Perform Regression and Correlation Analysis

a. Select a model (linear or nonlinear) that fits the plot and then generate the regression equation using Excel or Minitab

b. Test the resulting model for significance using the slope ((), correlation ((), or the ANOVA tests

(If resulting model is NOT significant, go back to Step 1)

c. Test the model for appropriateness using the analysis of residuals. This tests whether the assumptions on the residual are met. These assumptions are:

▪ Normal distribution

▪ Homoscedastic

▪ Indepedent

(If selected model is not appropriate, go back to Step 5a, else proceed to Step 7)

6. If model generated in Step 5 is significant but not appropriate, choose a different model (perhaps use curvilinear model) and repeat Step 5 until an appropriate model is found.

7. Use model for estimating:

1) the response variable (y)

Point Estimate – substitute the value of x

Into the regression equation

Interval Estimates:

1. Prediction Interval Estimate

2. Confidence Interval Estimate

(2) the actual slope (B) of the line

CI = b ( ( t (/2, n-2 ) sb

Definitions of Relevant Terms

Types of Variables:

y – response variable (a.k.a. dependent,

predicted, explained)

x – independent variable (a.k.a. predictor, explanatory)

Regression – provides a ‘best-fit’ mathematical equation

for the values of y,x variables

-- expresses the relationship of y and x in

equation form

mathematical equation may be

linear or curvilinear

linear: Y = a + bX (Direct, linear)

Y = a – bX (Inverse, linear)

curvilinear: Y = a + bX + cX2 (quadratic)

Y = e-X (negative expon.)

Y =1/X

Simple Linear Regression – regression model is linear with only ONE

predictor variable

y = b0 + b1X

Multipe Linear Regression -- regression model is linear with only TWO OR

MORE predictor variables

y = b0 + b1x1+ b2x2+ b3x3 + ...+ bKxk

Correlation Analysis – measures the strength of the relationship between Y,X

coefficient of correlation (r) – number that measures both the direction and the strength of the linear relationship between y and x

(1 ( r ( (1

coefficient of determination (r2) – the percent of the variation in y that is explained by the regression model

0% ( r2 ( 100%

Simple Linear Model and Assumptions

Models

Actual Population Model (

Estimated (sample) Model (

Assumptions on the residuals

Normally distributed

Homoscedastic (constant variance across all x values)

3) Statistically independent of each other

Testing the Model for Significance

Using the Slope (()Test

H0: ( = 0

H1: ( ( 0

Reject H0 if (t(> t (/2, n-k-1

Using the coefficient of correlation (()Test

H0: ( = 0

H1: ( ( 0

Reject H0 if (t(> t (/2, n-k-1

Using the ANOVA F -Test

H0: the Model is not significant

H1: the Model is significant

Reject H0 if F > F(, v1, v2

Using the Model for Estimation/Prediction

Estimating the actual slope (() of the model

b ( point estimate of the actual slope (() of the model

Computing a confidence interval for

the actual slope of the model

C.I. for ( = b ( t (/2, n-k-1 (sb)

Using the model to estimate the actual value of y for a given value of x

( point estimate of the actual value of y for

a given value of x

( computed by substituting the value of x into

the regression equation

Confidence Interval (C.I.) ( the interval that contains the

actual average value of the response variable ((y/x)

for a specific value of x

Prediction Interval (P.I.) ( the interval that contains the

actual value of the response variable (Y) for a specific

value of x

SIMPLE LINEAR REGRESSION:

A Solved Example

EXAMPLE:

A manufacturer of small electric motors uses an automatic milling machine to produce the slots in the shaft of the motors. A batch of shafts is run and then checked. All shafts in the batch that do not meet required dimensional tolerances are discarded.

At the beginning of each new batch, the milling machine is readjusted since its cutter head wears slightly during the production of the batch. The manufacturer is trying to pick an optimal batch size, but in order to do this (s)he must know how the size of the batch affects the number of defective shafts in the batch. Thirty (30) batches were inspected, and the number of defectives in each batch was counted. The results are shown below:

Batchsize Defects

5

10

125. 6

125. 7

150. 6

150. 7

175. 17

175. 15

200. 24

200. 21

200. 22

225. 26

225. 29

225. 25

250. 34

250. 37

250. 41

250. 34

275. 49

300. 53

300. 54

325. 69

350. 82

350. 81

350. 84

375. 92

375. 96

375. 97

400. 109

400 112

INITIAL MODEL (LINEAR)

MINITAB SOLUTION

A. Plot Batchsize and Number of Defects

GRAPH > PLOT

Graph Variables:

X: Batchsiz

Y: Defects

STAT > REGRESSION > FITTED LINE PLOT

Response (Y): Defects

Predictor (X): Batchsiz

Type of regression model: Linear

B. Generate the Regression Equation

STAT > REGRESSION > REGRESSION

Response: Defects

Predictors (X): Batchsiz

Click on Results:

Select In addition, sequential sums of…

Click OK

Click on Storage

Select Fits

Select Residuals

Select Standardized Residuals

Click OK

Click OK

[pic]

Generate the Residual Plots

STAT > REGRESSION > Residual Plots

Residuals: RESI1

Fits: FITS1

Click OK

EXCEL SOLUTION

Data > Data Analysis > Regression

Input:

Input Y range: Defects

Input X range: Batchsiz

Labels:

Output Range:

Residuals:

Residuals:

Standard Residuals:

Residual Plots: < select >

Line Fit Plots:

Normal Probability:

Normal Probability Plots: < select>

[pic]

[pic]

Analysis of Residual Plots

REVISED MODEL (NONLINEAR - Quadratic)

Minitab

Delete all non-empty columns except Defects and Batchsiz

Compute C3 = Batchsiz * Batchsiz

Calc > Calculator

Store result in variable: C3

Expression: Batchsiz*Batchsiz

Click OK

Name C3 as "Batchsiz^2"

STAT > REGRESSION > REGRESSION

Response: Defects

Predictors (X): Batchsiz Batchsiz^2

Click on Results:

Select In addition, sequential sums of…

Click OK

Click on Storage

Select Fits

Select Residuals

Select Standardized Residuals

Click OK

Click OK

Regression Analysis

The regression equation is

defects = 6.90 - 0.120 batchsiz +0.000950 batchsiz^2

Predictor Coef StDev T P

Constant 6.898 3.737 1.85 0.076

batchsiz -0.12010 0.03148 -3.82 0.001

batchsiz 0.00094954 0.00006059 15.67 0.000

S = 2.423 R-Sq = 99.5% R-Sq(adj) = 99.5%

Analysis of Variance

Source DF SS MS F P

Regression 2 34186 17093 2911.35 0.000

Residual Error 27 159 6

Total 29 34345

Source DF Seq SS

batchsiz 1 32744

batchsiz 1 1442

EXCEL SOLUTION

Create Batchsiz^2 column (must be adjacent to Batchsiz), where Batchsiz^2 = Batchsiz * Batchsiz

Data > Data Analysis > Regression

Input:

Input Y range: Defects

Input X range: highlight Batchsiz Batchsiz^2 range of data

Labels:

Output Range:

Residuals:

Residuals:

Standard Residuals:

Residual Plots: < select >

Line Fit Plots:

Normal Probability:

Normal Probability Plots: < select>

[pic]

GENERATING PREDICTION AND CONFIDENCE INTERVALS FOR Y

(Minitab)

[pic]

Stat > Regression > Regression > Option

[pic]

[pic]

(Note: Excel does not have this capability)

[pic]Problem 11.2

In trying to look at the effects of shopping center expansion, the Commerce Department decided to look at the relationship between the number of shopping centers and the retail sales for different states in the same region,. It collected the data for the North Central states in the US and found the following:

|State |Number of Shopping |Retail Sales |

| |Centers |($ billion) |

|Illinois |2096 |41.8 |

|Indiana |905 |21.4 |

|Michigan |1018 |25.3 |

|Minnesota |471 |13.9 |

|Ohio |1704 |41.6 |

|Iowa |308 |7.5 |

|Missouri |887 |22.7 |

|Wisconsin |625 |14.6 |

|South Dakota |58 |1.3 |

|North Dakota |87 |2.1 |

|Nebraska |264 |5.7 |

|Kansas |481 |11.6 |

a) Create a scatter plot of the data.

b) Find the regression equation relating retail sales and number of shopping centers.

c) Plot the regression line on the same plot as the data. Do you think the line fits the data well? Why or why not?

d) Use the regression line to predict retail sales for each state.

e) Calculate the residuals for each state. Which state has the largest residual? Which state has the smallest? Do the residuals support your answer to part (d)?

f) Find the standard error of the estimate.

[pic]

|SUMMARY OUTPUT | | | | | |

| | | | | | | |

|Regression Statistics | | | | | |

|Multiple R |0.9866955 | | | | | |

|R Square |0.9735681 | | | | | |

|Adjusted R Square |0.9709249 | | | | | |

|Standard Error |2.3387601 | | | | | |

|Observations |12 | | | | | |

| | | | | | | |

|ANOVA | | | | | | |

| |df |SS |MS |F |Significance F | |

|Regression |1 |2014.691 |2014.691 |368.330 |0.000 | |

|Residual |10 |54.698 |5.470 | | | |

|Total |11 |2069.389 | | | | |

| | | | | | | |

| |Coefficients |Standard Error |t Stat |P-value |Lower 95% |Upper 95% |

|Intercept |1.492612 |1.071387 |1.393158 |0.193764 |-0.894588 |3.879812 |

|X Variable 1 |0.021517 |0.001121 |19.191926 |0.000000 |0.019019 |0.024015 |

Problem 11.3

As part of an international study on energy consumption, data were collected on the number of cars in a country and the total travel in kilometers. The data for 12 of the countries are shown here:

| |Total Cars | |

| |Travel |Travel |

|Country |(million) |(billion km) |

|US |142.35 |3140.29 |

|Finland |1.82 |34.66 |

|Denmark |1.66 |30.76 |

|Britain |21.32 |352.76 |

|Australia |8.53 |138.22 |

|Sweden |3.32 |53.21 |

|Netherlands |5.53 |83.69 |

|France |23.27 |348.2 |

|Norway |1.59 |23.54 |

|Italy |26.12 |367.85 |

|Germany |43.75 |608.52 |

|Japan |40.25 |439.30 |

a) Create a scatterplot of the data. Do you think that there is a relationship between the number of kilometers traveled and the number of cars?

b) Find the least-squares regression line for the data. Interpret the value of the slope.

c) Does the intercept make sense for these data? Why or why not?

d) Plot the regression line on the same plot with the data. Does the line make you feel confident about predicting travel as a function of the number of cars?

e) Use the regression line to predict the number of kilometers for Sweden and Japan. How well do the predictions agree with the original data?

[pic]

|SUMMARY OUTPUT | | | | | |

| | | | | | | |

|Regression Statistics | | | | | |

|Multiple R |0.98503096 | | | | | |

|R Square |0.97028599 | | | | | |

|Adjusted R Square |0.96731458 | | | | | |

|Standard Error |156.136088 | | | | | |

|Observations |12 | | | | | |

| | | | | | | |

|ANOVA | | | | | | |

| |df |SS |MS |F |Significance F | |

|Regression |1 |7960585.694 |7960586 |326.5415 |0.0000 | |

|Residual |10 |243784.7804 |24378.48 | | | |

|Total |11 |8204370.475 | | | | |

| | | | | | | |

| |Coefficients |Standard Error |t Stat |P-value |Lower 95% |Upper 95% |

|Intercept |-106.2068 |55.1609 |-1.9254 |0.0831 |-229.1129 |16.6992 |

|X Variable 1 |21.5814 |1.1943 |18.0705 |0.0000 |18.9204 |24.2425 |

| | | | | | | |

Problem 11.23

How much does advertising impact market penetration? To assess the impact of advertising in the tobacco industry, a study looked at the amount of money spent on advertising a particular brand of cigarettes and brand preference among adolescents and adults. The data are shown here:

| | |Brand Preferences |

| |Advertising | |

|Brand |($ million) | |

| | |Adolescent |Adult |

| | |(%) |(%) |

|Marlboro |75 |60 |23.5 |

|Camel |43 |13.3 |6.7 |

|Newport |35 |12.7 |4.8 |

|Kool |21 |1.2 |3.9 |

|Winston |17 |1.2 |3.9 |

|Benson & Hedges |4 |1 |3.0 |

|Salem |3 |0.3 |2.5 |

a) Look at the data for brand preference for adolescents and amount spent on advertising. Which variable is the dependent variable? Which is the independent variable?

b) Create a scatter plot of advertising and adolescent brand preference. Do you think that there is a linear relationship between the two variables? Why or why not?

c) Now create another scatter plot using adult brand preference instead. How does this plot compare to the one for adolescent brand preference? From the plots, do you think that adolescent or adult brand preference is more strongly related to advertising expenditures? Why?

d) Find the least squares line for adolescent brand and advertising expenditures

e) Interpret the meaning of the slope and intercept of the model. Do they make sense?

f) Use the model to predict adolescent brand preference for each brand studied. How well do the predicted values agree with the actual data?

g) Using a 0.05 significance level, is the model significant?

[pic]

[pic]

|ADOLESCENT MARKET | | | | |

| | | | | | | |

|SUMMARY OUTPUT | | | | | |

| | | | | | | |

|Regression Statistics | | | | | |

|Multiple R |0.923547 | | | | | |

|R Square |0.852939 | | | | | |

|Adjusted R Square |0.823527 | | | | | |

|Standard Error |9.063086 | | | | | |

|Observations |7 | | | | | |

| | | | | | | |

|ANOVA | | | | | | |

| |df |SS |MS |F |Significance F | |

|Regression |1 |2382.011 |2382.011 |28.99957 |0.002978 | |

|Residual |5 |410.6976 |82.13953 | | | |

|Total |6 |2792.709 | | | | |

| | | | | | | |

| |Coefficients |Standard Error |t Stat |P-value |Lower 95% |Upper 95% |

|Intercept |-9.42472 |5.365513 |-1.75654 |0.139344 |-23.2172 |4.367747 |

|Advertising ($m) |0.786227 |0.146 |5.385125 |0.002978 |0.410923 |1.161531 |

| | | | | | | |

|ADULT MARKET | | | | | |

| | | | | | | |

|SUMMARY OUTPUT | | | | | |

| | | | | | | |

|Regression Statistics | | | | | |

|Multiple R |0.901096 | | | | | |

|R Square |0.811974 | | | | | |

|Adjusted R Square |0.774369 | | | | | |

|Standard Error |3.536488 | | | | | |

|Observations |7 | | | | | |

| | | | | | | |

|ANOVA | | | | | | |

| |df |SS |MS |F |Significance F | |

|Regression |1 |270.0463 |270.0463 |21.59205 |0.005599 | |

|Residual |5 |62.53373 |12.50675 | | | |

|Total |6 |332.58 | | | | |

| | | | | | | |

| |Coefficients |Standard Error |t Stat |P-value |Lower 95% |Upper 95% |

|Intercept |-0.58794 |2.093665 |-0.28082 |0.790098 |-5.96987 |4.793986 |

|Advertising ($m) |0.264725 |0.05697 |4.646724 |0.005599 |0.118279 |0.411172 |

| | | | | | | |

Multiple Linear Regression

(560 to 595)

Problem 12.1

A group of legislators wanted to look at factors that affect the number of traffic fatalities. They collected some data for 1994 from the NTSB on the number of fatalities for 50 states and the District of Columbia, the number of licensed drivers, the number of registered vehicles, and the number of vehicle miles traveled. A portion of the data is shown on page 584. Full dataset is in traffat.xls

|SUMMARY OUTPUT | | | | | | | |

| | | | | | | | |

|Regression Statistics | | | | | | | |

|Multiple R |0.982548538 | | | | | | |

|R Square |0.96540163 | | | | | | |

|Adjusted R Square |0.963193224 | | | | | | |

|Standard Error |154.5407481 | | | | | | |

|Observations |51 | | | | | | |

| | | | | | | | |

|ANOVA | | | | | | | |

| |df |SS |MS |F |Significance F | |

|Regression |3 |31321046.9 |10440349 |437.1485 |2.54274E-34 | | |

|Residual |47 |1122493.613 |23882.84 | | | | |

|Total |50 |32443540.51 | | | | | |

| | | | | | | | |

| |Coefficients |Standard Error |t Stat |P-value |Lower 95% |Upper 95% | |

|Intercept |51.7481659 |30.43306219 |1.700393 |0.095666 |-9.475200509 |112.9715 | |

|Licensed Drivers |0.06294764 |0.048829545 |1.28913 |0.203662 |-0.035284642 |0.16118 | |

|Registered Vehicles |-0.211896991 |0.055989427 |-3.78459 |0.000436 |-0.324533083 |-0.09926 | |

|Vehicle Miles Travelled |0.029349954 |0.003525079 |8.326041 |8.34E-11 |0.022258416 |0.036441 | |

| | | | | | | | |

a) How many independent variables are there in the model proposed? What are they?

b) Use the computer output to write won the regression model.

c) Interpret the coefficients of the model.

d) Use the model to predict the number of traffic fatalities for the states shown in the data table.

e) Compare the predicted values from the model to the actual values. Based on the plot, does the model do a good job of predicting the number of traffic fatalities?

Problem 12.9

In the problem about number of traffic fatalities the model was rerun, dropping the data on number of licensed drivers that had the lowest t statistic. The output is shown below:

Regression Analysis: Traffic Fata versus Registered V, Vehicle Mile

The regression equation is

Traffic Fatalities = 46.0 - 0.163 Registered Vehicles

+ 0.0300 Vehicle Miles Travelled

Predictor Coef SE Coef T P

Constant 46.04 30.32 1.52 0.135

Register -0.16280 0.04132 -3.94 0.000

Vehicle 0.029996 0.003513 8.54 0.000

S = 155.6 R-Sq = 96.4% R-Sq(adj) = 96.3%

Analysis of Variance

Source DF SS MS F P

Regression 2 31281357 15640679 645.98 0.000

Residual Error 48 1162183 24212

Total 50 32443541

a) Write down the equation of the new two-variable model.

b) Compare the new model to the model with three variables. How much does the model change when number of licensed drivers is dropped?

c) Compare the value of R2 for both models. What does this make you think about the decision to drop number of licensed drivers from the model?

d) Would you consider a two-variable model a good model? Why or why not?

e) Based on the value of the R2 would you be satisfied with this model or would you want to consider other variables?

BA 282: APPLIED BUSINESS STATISTICS

Fall 1999 Name____________________

Midterm Exam

Part 1: Multiple Choice

1. Using the Standard Normal distribution, the area between –1.5 and –2.4 is:

a. 0.9250

b. 0.0586

c. -0.0568

d. -0.9250

e. None of the above

For questions 2 to 7, select the most appropriate pair of hypotheses for each statement.

2. The average age of SOU students is more than 21.5 years.

a. H0: ( ( 21.5 H1: ( > 21.5

b. H0: ( ( 21.5 H1: ( < 21.5

c. H0: ( = 21.5 H1: ( ( 21.5

d. H0: ( ( 21.5 H1: ( > 21.5

e. H0: ( ( 21.5 H1: ( < 21.5

3. A new medication for headache is touted to relieve pain in less than 5 minutes.

a. H0: ( ( 5 H1: ( > 5

b. H0: ( ( 5 H1: ( < 5

c. H0: ( = 5 H1: ( ( 5

d. H0: ( ( 5 H1: ( > 5

e. H0: ( ( 5 H1: ( < 5

4. A CPA review program is advertised as “guaranteed to improve your CPA test scores.” Fifty graduating accounting students from a business school were randomly assigned to two groups – group in which students didn’t participate (D) in the review program and the other group participated (P) in the review program.

a. H0: (P ( (D H1: (P > (D

b. H0: (P ( (D H1: (P < (D

c. H0: (P = (D H1: (P ( (D

d. H0: (P ( (D H1: (P > (D

e. H0: (P ( (D H1: (P < (D

5. A councilperson claims that there is no difference in the level of support to Measure 51 among Republican (R) and Democratic (D) voters in the Rogue Valley.

a. H0: (R ( (D H1: (R > (D

b. H0: (R ( (D H1: (R < (D

c. H0: (R = (D H1: (R ((D

d. H0: (R ( (D H1: (R < (D

e. H0: (R = (D H1: (R ( (D

6. A filling machine is supposed to fill an average of 12 ounces when operating properly.

a. H0: ( ( 12 H1: ( > 12

b. H0: ( (12 H1: ( < 12

c. H0: ( = 12 H1: ( ( 12

d. H0: ( ( 12 H1: ( > 12

e. H0: ( ( 12 H1: ( < 12

7. A right-tailed test of a population mean results in a p-value that is practically zero. This means that the sample represents:

a. A weak evidence supporting the null hypothesis

b. A weak evidence supporting the alternative hypothesis

c. A strong evidence supporting the null hypothesis

d. A strong evidence supporting the alternative hypothesis

8. Of the following variations, which does not belong?

a. Common cause variation

b. Special cause variation

c. Explained variation

d. Nonrandom variation

9. A confidence interval estimate has two components – a point estimate and a margin of error. Which of these two components is affected by the confidence level?

a. Point estimate

b. Margin of Error

c. None of the above

10. A sample statistic (e.g. sample mean, sample proportion, or sample variance) is a random variable. Which type of random variable is a sample statistic?

a. Continuous

b. Discrete

11. "The distribution of the sample means of any type of distribution will approximate the normal distribution, as the sample size increases." This sounds like the definition of the

a. Standard Normal Distribution

b. Z-distribution

c. Central Limit Theorem

d. Binomial distribution approximated by the Normal distribution

e. None of the above

12. Which of the following does not belong?

a. s

b. p

c. x-bar

d. (

e. All of the above belong to the same group

13. Which of the following is NOT true of a sample mean?

a. It is a point estimate

b. It is a statistic

c. It is a continuous random variable

d. All of the above (a-c) are true of a sample mean

e. None of the above (a-c) are true of a sample mean

14. The two components of a confidence interval estimate of a population parameter are:

a. Confidence Level and Margin of Error

b. Sample Size and Statistic

c. Point Estimate and Margin of Error

d. Sample Mean and Sample Proportion

e. Margin of Error and Sample Size

15. The conditions in using the Normal distribution as an approximation to the binomial distribution are that np and n(1-p) be both at least 5.

a. True

b. False

16. Which of the following will be the benefit derived from using a larger sample size in estimating an unknown population parameter?

a. A larger margin of error

b. A smaller margin of error

c. A lower confidence level

d. A higher confidence level

e. (b) and/or (d)

f. (a) and/or (c)

17. Using the Standard Normal distribution table, find the area below the z-score of –2.50.

a. -0.4938

b. 0.4938

c. 0.9938

d. 0.0062

e. None of the above

18. Which of the following pairs of hypotheses is NOT correct?

a. H0: ( ( 3.5 H1: ( > 3.5

b. H0: ( ( 3.5 H1: ( < 3.5

c. H0: p < 0.035 H1: p > 0.035

d. All of the above are correct

e. None of the above are correct

19. For a one-tailed test of a population mean the significance level has been set at 0.01. Assume that the population standard deviation is not known, sample size is 10, and the population is normally distributed. What distribution is appropriate for performing the hypothesis test?

a. z-test

b. t-test

c. F-test

d. Binomial

e. None of the above

20. In testing the mean of a population, which of the following is a necessary condition for using a t distribution?

a. n is small

b. ( is not known

c. The population is infinite

d. All of these

e. (a) and (b) but not (c)

21. Assume that you took a sample and calculated the sample mean as 100. You then calculated the lower and upper limit of a 90 percent confidence interval for ( to be 90 and 110, respectively. What is the margin of error of the estimate?

a. 0.10

b. 90 percent

c. 20

d. 10

e. 100

22. A single value used to estimate an unknown population parameter is known as a(n)

a. Point estimate

b. Interval estimate

c. Statistic

d. Parameter

e. (a) and (c)

23. In hypothesis testing, we conclude to reject or fail to reject the null hypothesis by comparing the computed statistic to a critical statistic. Another form of decision rule is by comparing the p-value to a significance level. Which of the following is a correct decision rule?

a. Reject H0 if z > p-value

b. Reject H0 if z > (

c. Reject H0 if p-value > (

d. Reject H0 if p-value < (

e. All of the above are correct forms of decision rule

24. Which of the following variations cannot be removed but only can be reduced by redesigning or improving the process?

a. common cause variation

b. special cause variation

25. If n = 45 and ( = 0.05, then the critical value of z for testing the hypotheses

H0: ( ( 3.5 and H1: ( > 3.5 is

a. 0.0199

b. 1.96

c. -1.96

d. -1.645

e. 1.645

26. When a null hypothesis is rejected, it is possible that:

a. A correct conclusion has been made

b. A Type II error has been made

c. A Type I error has been made

d. (a) or (b) is correct

e. (a) or (c) is correct

27. Which of the following actions will reduce the Type I and II errors simultaneously?

a. Decreasing the significance level of a test

b. Increasing the confidence level of a test

c. Decreasing Beta error

d. Increasing the sample size

e. Decreasing the sample size

28. One concludes whether to “reject” or “fail to reject” the null hypothesis based on a decision rule. The decision rule is nothing more than a comparison of a calculated value and a critical value. Which of the two is based on the significance level of a test?

a. Calculated value

b. Critical value

29. Which of the following is NOT true of a critical value

a. It marks the boundary between the “reject H0“ and the “fail to reject to reject H0“ regions

b. It is based on the significance level of a test

c. It is determined from the statistic derived from a sample

d. All of the above are true

e. None of the above are true

30. If one were to perform a hypothesis test using the following hypotheses:

H0: shipment is GOOD and H1: shipment is BAD Which of the two types of errors is called the Producer’s risk?

a. Type I (alpha)

b. Type II (beta)

c. Both (a) and (b)

d. Neither (a) nor (b)

31. When the sample size as a proportion of the population size (n/N) gets larger, the value of the finite correction multiplier approaches which value?

a. 0

b. 1

c. None of the above

32. In statistical process control charts are used as tools to monitor processes. All processes exhibit variability. When NOT in control, the ___________ variability is said to be present:

a. common cause variability

b. special cause variability

c. none of the above

33. A 5-week diet program is claimed be effective in reducing the weights of participants in the program. Skeptical about the claim, you randomly select 10 applicants and weigh each one before and after the 5-week period. This problem involves:

a. Comparison of two population proportions

b. Comparison of means of two independent samples

c. Comparison of means two dependent samples

d. Comparison of variations of two independent samples

34. Which of the following sampling distributions would be used in comparing means of two populations, with n1 = 32, n2 = 40?

a. Z test

b. pooled t test (or equal variances t test)

c. unpooled t test (or unequal variances t test)

d. Binomial

Use for questions 35-37

You work for a market research agency and you were asked to estimate the proportion of people with personal computers who are using Windows 97 as an operating system. How many people will you need to survey for your estimate to be within 2 percentage points of the actual value and be 90 percent confident with this estimate?

35. The problem described above involves:

a. Testing a hypothesis about a population mean

b. Computing a confidence interval estimate of a population average

c. Computing a sample size to estimate a true population proportion

d. Estimating a confidence interval estimate of a true population mean

e. None of the above

36. How much is the stated margin of error?

a. 90 percent

b. 10 percent

c. 0.10

d. 2 percentage points

e. 1.645

37. Give the z-value that will be used for computing the 90 percent confidence interval estimate of the true population parameter.

a. 1.96

b. 1.32

c. 0.10

d. 2 percentage points

e. 1.645

Use for questions 38- 43

C. Garr Smoke claims that no more than 5 percent of the 40-60 male group smoke cigars. Of 2500 males of this age group you recently sampled, 200 said they smoke cigars. At 0.05 significance level, do the refute C. Garr Smoke’s belief?

38. The sample statistic in this problem is:

a. Population proportion

b. Population mean

c. Sample proportion

d. Sample mean

e. None of the above

39. In this problem the statement “no more than 5 percent” is:

a. The hypothesized population proportion

b. The hypothesized population mean

c. The sample proportion

d. The sample mean

e. None of the above

40. State the null and alternative hypotheses of this problem

a. H0: ( ( 5 H1: ( > 5

b. H0: ( ( 5 H1: ( < 5

c. H0: ( = 5 H1: ( ( 5

d. H0: ( ( 0.05 H1: ( > 0.05

e. H0: ( ( 0.05 H1: ( < 0.05

41. Identify the critical value for the test (one tail).

a. 0.0199

b. 1.96

c. -1.96

d. -1.645

e. 1.645

42. If the computed value for the test is 6.88, the p-value is

a. almost 1

b. almost 0

c. close to the significance level

43. If the resulting p-value of the test is less than the significance level, then you would conclude to

a. Reject the null hypothesis

b. Fail to reject the null hypothesis

Use for questions 44- 47

A torque wrench used in the final assembly of cylinder heads requires a process average of 135 lbs-ft. The process is known to have a standard deviation of 5.0 lbs-ft. For a simple random sample of 45 sample nuts that the machine has recently tightened, the sample average is 137 lbs-ft. Using a 0.05 significance level, determine whether the machine is operating at the desired level.

44. The appropriate hypotheses for the problem are:

a. H0: ( ( 135 H1: ( > 135

b. H0: ( (135 H1: ( < 135

c. H0: ( = 135 H1: ( ( 135

c. None of the above are correct

45. The appropriate distribution for the test above is

a. t-distribution

b. z-distribution

c. F-distribution

46. The computed value is

a. -0.40

b. 0.40

c. -2.68

d. 2.68

e. None of the above

47. Assuming that the critical value for this problem is 1.645, and another sample produced a computed value of 1.55. For this sample your statistical conclusion is to:

a. Reject the null hypothesis and conclude that the process is operating at the desired level

b. Accept the null hypothesis and conclude that the process is operating at the desired level

c. Reject the null hypothesis and conclude that the process is not operating at the desired level

d. Accept the null hypothesis and conclude that the process is not operating at the desired level

Use for questions 48- 50

A pharmaceutical company is testing two new compounds intended to reduce blood-pressure levels. The compounds are administered to two different sets of lab animals. In Group 1, 71 of 100 animals tested respond to drug 1 with lower blood-pressure levels. In Group 2, 58 of 90 animals tested respond to drug 2 with lower blood-pressure levels. The company wants to test at the .05 level whether drug 1 is more effective in reducing blood pressure levels than drug 2.

48. The problem involves which of the following procedures?

a. Comparison of two population proportions

b. Comparison of means of two populations using dependent samples

c. Comparison of means of two populations using independent samples

d. Comparison of variances of two populations

49. Using the following subscripts for the two groups: 1- drug 1; 2- drug 2, which of the following is the most appropriate pair of hypotheses?

a. H0: (1 ( (2 H1: (1 > (2

b. H0: (1 ( (2 H1: (1 < (2

c. H0: (1 ( (2 H1: (1 > (2

d. d. H0: (1 ( (2 H1: (1 < (2

e. e. H0: (1 = (2 H1: (1 ( (2

50. (Bonus) On which day is Thanksgiving celebrated?

a. Monday

b. Tuesday

c. Wednesday

d. Thursday

e. Friday

f.

BA 282: Applied Business Statistics

Final Exam -- Spring 1999 Name ______________________

1. Find the value of (2 .05,20

2. Find the value of F.05,2,10

3. In ANOVA, we will tend to not reject the null hypothesis of equal population means whenever the calculated F is:

a. small

b. large

c. equal to the critical F

d. none of the above is correct

4. In a two-way ANOVA, in the xijk = ( + (i + (j + ((()ijk+ (ijk model, the term ((()ij represents

a. random error in the sampling process

b. the effect that is due to factor A

c. the effect that is due to factor B

d. the interaction effect between level i of factor A and level j of factor B

e. the level of significance at which the null hypothesis is rejected

5. Which of the following is a typical source of internal secondary data for business research?

a. accounting or financial reports

b. sales information

c. production data

d. all of the above

[pic]

For questions 6 to 10, refer to the plot on the right.

6. The equation for the line going through the points would take the form of:

a. Y = a + b+ c

b. Y = a - bX

c. Y = x -1

d. Y = a + bX2

e. None of these is correct

7. In this particular problem, the researcher is trying to predict:

a. Quantity demanded based on price

b. Price based on quantity demanded

c. Both price and quantity demanded

d. None of these is correct

8. If computed, the sign of b in the equation would be:

a. Either positive or negative

b. Positive

c. Negative

d. None of the above

9. The correlation coefficient of the problem, if computed, could be:

a. 1.00

b. 0

c. -1.0

d. None of the above

10. Which of the following won't be true for the regression resulting from the data in the plot above?

a. r2 = 100%

b. r = 1

c. sy.x = 0

d. ANOVA p-value = 0

e. All of the above are true

11. In multiple regression analysis, multicollinearity means:

a. High correlation between the dependent variable and the independent variables

b. A high correlation between the response variable and some independent variables

c. A condition where 2 or more of the independent variables are highly correlated with each other

d. None of the above

Use the following regression output to answer questions 12 to 15.

The regression equation is

sales = 46.5 + 52.6 ad

Predictor Coef Stdev t-ratio p

Constant 46.486 9.885 4.70 0.000

ad 52.57 10.26 5.12 0.000

s = 6.837 R-sq = 76.6% R-sq(adj) = 73.7%

Analysis of Variance

SOURCE DF SS MS F p

Regression 1 1226.9 1226.9 26.25 0.000

Error 8 374.0 46.7

Total 9 1600.9

12. Identify the coefficient of determination __________

13. Write the standard deviation of the slope (sb )____________

14. Write the standard error of the estimate (sy.x)_________

15. Identify the slope ______________

16. List ONE of the 3 assumptions that underlie the simple regression model ___________

17. Suppose you wished to investigate the effect of consumption of alcohol (Y/N) and a common over-the-counter cold medicine (Y/N) on a person's reaction time. The appropriate statistical procedure for this experiment is:

a. Chi-square test of independence

b. Analysis of Variance (Two factor)

c. Analysis of Variance (One factor)

d. Regression analysis

e. Discriminant analysis

18. Which of the following is not a linear function?

a. Y = a + bX

b. Distance = Rate of speed ( Travel time

c. Total Profit = profit per unit ( Number units sold

d. Total Cost = Fixed Cost + Unit Variable Cost ( Quantity Produced

e. All of the above are linear functions

19. There are two main uses of a multiple linear model: 1) for slope analysis, or 2) for estimating the value of Y given a value of X's. For which of the two uses is multicollinearity not a problem? _____________

20. The _____ interval is the interval that contains the actual average value of the response variable for a specific value of X

a. Confidence

b. Prediction

21. The _____ interval is the interval that contains the actual value of the response variable for a specific value of X

a. Confidence

b. Prediction

22. Residual is also known as:

a. Error

b. Actual Y ( Estimated Y

c. Observed Y ( Fitted Y

d. None of the above

e. All of the above

PROBLEM 1:

Ryerson Coil Pickling manager wishes to know how the level of pickling operation (measured in tons) affects the monthly overtime expense of the plant. He collects data for the last 17 months on actual tonnage processed and overtime cost. He then performs regression analysis on the data. Using the attached output, answer questions 23 to 37.

Partial Data:

Month 1 2 3 . . . 17

Production (Tons) 29,668 23,577 27,117 . . . 19,365

Overtime Expense $11,000 $8,000 $9,000 . . . $7,000

23. What is the response variable in this problem? __________

24. What is(are) the independent variable(s)? ______________

25. Give the linear regression equation generated by the 17 observations. __________

26. Using the regression equation, estimate the plant’s overtime expense for a month where 30,000 tons of steel is planned to be processed. __________

27. The correlation between the response variable and the predictor variables could be best described as:

a. Perfectly positively linear

b. Perfectly negatively linear

c. Positively correlated

d. Negatively correlated

28. For the planned production described in #26, give the 95% interval estimate for the plant’s actual overtime expense. ____________________

29. How much of the variation in overtime expense is explained by production level? _____________

30. For every ton increase in processing level, plant overtime expense salary is expected to:

a. increase by $0.587

b. increase by $5.87

c. decrease by $0.587

d. decrease by $5.87

e. None of the above

31. If for the year 2000 the plant plans to process 30,000 tons of coils each month, give the 95% interval estimate of the actual average monthly overtime expense. ____________

32. What is the coefficient of correlation of this model?_____________

33. Using the slope test set up the null and alternative hypotheses to determine whether the model you identified in #25 is significant.

34. Write the decision rule for the hypotheses in #33 ______________

35. Identify the computed and critical values corresponding to the decision rule in #34. ________, ________

36. Based on #35, is the model significant? __________

37. Compute the 95% confidence interval for the actual change in overtime expense for every ton increase in production level. _____________

PROBLEM 2:

In a recent survey of winter 1999 BA 282 students, 41 students from the Medford section and the Ashland sections responded. One of the objectives of the survey was to investigate what factors could possibly affect students’ success in the course's midterm exam. A regression analysis was performed in which 4 explanatory variables were included in the model. The variables were the following:

GPA – student’s overall GPA to date

243GRADE – student’s final grade in the prerequisite course, MA 243

243WHEN – the number of terms ago the student took MA 243

WHERE – where the student is currently taking BA 282 ( 0 code for Ashland section, 1 for Medford

Use the attached regression output in answering questions 38 to 49

38. What is the dependent variable in this problem? __________

39. What are the predictor variables? ______________

40. Give the linear regression equation generated by the 41 observations. __________

41. Give an estimated midterm grade for a student in the Medford section who earned 3.5 in MA 243 grade a term ago, and currently holds a 3.25 overall GPA. __________

42. How much of the variation in BA 282 midterm exam can be explained by the regression equation? _____________

43. For every unit increase in MA 243 grade, BA 282 midterm grade is estimated to

a. increase by .330 units

b. decrease by .330 units

c. increase by .781 units

d. decrease by .781 units

e. None of the above

44. Compute the 80% confidence interval for the actual slope of the variable MA 243. ___________

45. What is the multiple coefficient of determination of this model? __________

46. Using the ANOVA test set up the null and alternative hypotheses to determine whether the model is significant.

47. Write the decision rule for the hypotheses in #46 ______________

48. Identify the computed and critical values (use a 0.05 significance level) corresponding to the decision rule in #47 ________, ________

49. Based on #48, is the model significant? __________

PROBLEM 3:

Given a significance level of 0.05, is there a significant difference in the average midterm grades of the students in the 3 sections of BA 282? (output attached)

50. The most appropriate testing procedure for the problem stated above is:

a. Chi-square test of independence

b. Regression and correlation

c. Discriminant analysis

d. Oneway ANOVA

e. Twoway ANOVA

51. Write the appropriate hypotheses and decision rule for comparing the average test scores of the three sections.

52. Give the computed and critical F values for carrying the test in #51____________, ____________

53. At 0.05 significance level, which of the three sections has the largest average test scores (note: your answer here should be consistent with your answer to #51 and 52)? _______

PROBLEM 4:

A test was conducted to determine if grade in MA243, or when MA 243 was taken, has any effect on the midterm grades in BA 282. Also of interest in the survey was to determine whether grade in MA 243 and the time it was taken have some interaction effect on the BA 282 midterm grades. But before the ANOVA test was conducted, the raw data were recoded ( MA 243 grades were re-classified into two groups – A’s and Non A’s. The term it was taken was also reclassified into two groups – one term ago and two or more terms ago. Grades in BA 282 midterm (not changed) are in 4 to 0 scale, representing A to F letter grades. Also, since twoway ANOVA with replication requires that each cell contain equal samples, six students from combination of MA 243 grade and term group were randomly selected to fill the cells – the resulting crosstabulation of the observations is shown below:

| |When MA 243 was Taken |

|MA 243 Grade | |

| |One Term Ago |Two or More Ago |

|A |3,4,3,2,4,3 |2,3,3,2,4,3 |

|Non A's |3,0,2,1,2,3 |1,1,1,2,1,1 |

Use 0.05 significance level for the following tests.

54. Write the appropriate hypotheses for testing whether MA 243 grade has an effect on BA 282 midterm exam. ________________

55. Does MA 243 grade have a significant effect on BA 282 midterm grades? If yes, which grade category performs better? _____________

56. Write the appropriate hypotheses for testing whether when MA 243 was taken has an effect on BA 282 midterm exam. ________________

57. Does the time when MA 243 was taken have a significant effect on BA 282 midterm grade? If yes, which time has higher average midterm grades? _____________

58. Is there a significant interaction between MA 234 grade and the time when it was taken on BA 282 midterm scores?

59 and 60 BONUS

Problem 1:

MTB > Regress 'P_overt' 1 'Ton_prod';

SUBC> Predict 30000.

Regression Analysis

The regression equation is

P_overt = - 6776 + 0.587 Ton_prod

Predictor Coef StDev T P

Constant -6776 4178 -1.62 0.126

Ton_prod 0.5868 0.1760 3.33 0.005

S = 2566 R-Sq = 42.6% R-Sq(adj) = 38.7%

Analysis of Variance

Source DF SS MS F P

Regression 1 73217721 73217721 11.12 0.005

Residual Error 15 98782279 6585485

Total 16 172000000

Predicted Values

Fit StDev Fit 95.0% CI 95.0% PI

10828 1306 ( 8044, 13611) ( 4691, 16965)

PROBLEM 2:

MTB > Regress 'MIDTERM' 4 'GPA' '243GRADE' '243WHEN' 'WHERE';

SUBC> Constant;

SUBC> Brief 2.

The regression equation is

MIDTERM = - 0.92 + 0.781 GPA + 0.330 243GRADE - 0.242 243WHEN + 0.601 WHERE

41 cases used 5 cases contain missing values

Predictor Coef StDev T P

Constant -0.923 1.020 -0.90 0.371

GPA 0.7813 0.3569 2.19 0.035

243GRADE 0.3302 0.1977 1.67 0.104

243WHEN -0.2421 0.1339 -1.81 0.079

WHERE 0.6011 0.3957 1.52 0.137

S = 0.9111 R-Sq = 40.7% R-Sq(adj) = 34.1%

Analysis of Variance

Source DF SS MS F P

Regression 4 20.5079 5.1270 6.18 0.001

Residual Error 36 29.8823 0.8301

Total 40 50.3902

Problem 3

Analysis of Variance for MIDTERM

Source DF SS MS F P

SECTION 2 4.48 2.24 1.91 0.160

Error 42 49.17 1.17

Total 44 53.64

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev --+---------+---------+---------+----

MW 12 1.750 1.055 (---------*----------)

TR-A 21 2.000 1.225 (-------*-------)

TR-M 12 2.583 0.793 (---------*----------)

--+---------+---------+---------+----

Pooled StDev = 1.082 1.20 1.80 2.40 3.00

Problem 4

Two-way Analysis of Variance

Analysis of Variance for MIDTERM

Source DF SS MS F P

ma243 1 13.500 13.500 20.25 0.000

ma243whe 1 1.500 1.500 2.25 0.149

Interaction 1 0.167 0.167 0.25 0.623

Error 20 13.333 0.667

Total 23 28.500

Individual 95% CI

ma243 Mean ----+---------+---------+---------+-------

A 3.00 (-------*-------)

B,C,D 1.50 (-------*-------)

----+---------+---------+---------+-------

1.20 1.80 2.40 3.00

Individual 95% CI

ma243whe Mean ---+---------+---------+---------+--------

One Term 2.50 (------------*-----------)

Two and 2.00 (-----------*-----------)

---+---------+---------+---------+--------

1.60 2.00 2.40 2.80

-----------------------

Use for generating means, standard deviation, etc

Use for testing a population mean

(n ≥ 30 or ( known)

Use for test湩⁧⁡潰異慬楴湯瀠潲潰瑲潩⁮愨灰潲楸慭楴湯琠⁯桴⁥楢潮業污ഩ唍敳映牯琠獥楴杮愠瀠灯汵瑡潩⁮敭湡‍渨㰠㌠ⰰ⠠甠歮潮湷‬湡⁤潮浲污ഩ唍敳映牯挠浯慰楲杮瀠潲潰瑲潩獮漠⁦睴⁯潰異慬楴湯൳唍敳映牯挠浯慰楲杮洠慥獮漠⁦睴⁯䕄䕐䑎久⁔慳灭敬൳唍敳映牯挠浯慰楲杮洠慥獮漠⁦睴⁯义䕄䕐䑎久⁔慳灭ing a population proportion (approximation to the binomial)

Use for testing a population mean

(n < 30, ( unknown, and normal)

Use for comparing proportions of two populations

Use for comparing means of two DEPENDENT samples

Use for comparing means of two INDEPENDENT sample

Use for testing variances (or standard deviation) of two groups

Use for generating interactions plot for two-way ANOVA with replication

Use for two-way ANOVA with replication

Use for One-way ANOVA

(procedure for comparing means of two or more independent samples)

Use for performing Chisquare test using RAW data (procedure for testing whether two qualitative variables are independent)

Use for performing Chisquare test using TABULATED data

Use for generating output for regression and correlation analysis

(for simple and multiple models)

Use for testing variances (or standard deviation) of two groups

Use for generating means, standard deviation, etc

Use for two-way ANOVA with replication

Use for One-way ANOVA

Use for regression and correlation analysis

Use for comparing means of two DEPENDENT sample

Use for comparing means of two INDEPENDENT samples

(n < 30 and equal variances)

Use for comparing means of two INDEPENDENT samples

(n < 30 and unequal variances)

Use for comparing means of two INDEPENDENT samples

(n ≥ 30)

Use for comparing means of two INDEPENDENT samples

(n < 30 and equal variances)

Use for comparing means of two INDEPENDENT samples

(n < 30 and equal variances)

Parameters

Statistics

B

A

f(x)

x

x

(

(

0

( = 1

z

x

x

x

x

[pic] = [pic]

[pic]

[pic]

[pic]

[pic]

[pic]

Point estimators a.k.a. statistics

Parameters

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

Descriptive statistics

Margin of Error

0

z

[pic]

[pic]

[pic]

[pic]

Direction of test (alternative hypothesis)

Hypothesized mean

Population standard deviation

Raw data

p-value of the test

(compare to the significance level)

Computed statistic (compare to the critical statistic)

Confidence interval for the true mean

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

Least squares regression equation

Coefficient of determination

[pic]

[pic]

[pic]

[pic]

[pic]

Columns where the new values for the predictors variables can be found

Values for batch and batch^2 to predict defect rates

Predicted values for number of defects

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Introduction - Southern Oregon University

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches