Mendocino College



Online Assignment #9 Analysis of VarianceThis final technique of inference that we’re going to cover is similar to goodness-of-fit, but where goodness-of-fit looks at the different proportions (fractions) of data in different categories, analysis of variance (nicknamed ANOVA) looks at the different means that different categories have to determine whether the population means are different, based the on the sample means.Here’s an example. I’m interested in finding out whether differences in attitudes toward math classes reflect the age of the students. I’m going to claim that age affects attitudes towards math classes, or in other words that the populations (negative, neutral, positive) have different means. So I take the five responses (1 and 2 negative, 3 neutral, and 4 and 5 positive), and note the age of each person.#MathAge#MathAge#MathAge#MathAge131821418413196124623192231942418623183219233194342763449412024319442346434053202522145335652196418262314642966128741827462472196713181212822148218682259419294204944769430102203032150318703171141831317512187122012425323205242772317132193334553220734161433834219543187441715318353205525175218163193641856238762421731837222573207734218218383185823778126193173921859437794192012340219604188042381219It’s a little easier to do without all the extra columns to confuse us.You put all the ages of people whose attitudes were negative in one list, neutral in another, and positive in a third list.You’ll know you did it right if the mean age of the negatives is 25, the neutrals 22.185185…, and the positives 26.0454545…At this point we might want to make a theory about the differences in these means. Here’s one: the youngest group is the one that was neutral, because those students haven’t yet taken college math and might not have formed a pro or con attitude, whereas the older ones are more likely to have decided that they either like or dislike math.I know, it sounds a little farfetched to me too. But here’s the thing: there’s no point to theorize or speculate until you know if the differences in age are big enough to say that they weren’t a chance occurrence. This is the meaning of the p-value: how likely could the result have been a random happening, not at all related to the attitude toward math classes.And the method we use to find the p-value in this situation is analysis of variance, which gets its name from how we decide if the x’s are enough different. We compare the variability (variance) between the groups to the variability (variance) within the groups. (For more information, read Lecture #24.)The distribution we use is called F, and it has two degrees of freedom (but let’s not worry about that!). This graph shows several F-distributions:They look somewhat like χ2 in the whole ski-jump shape, but the hump doesn’t keep moving to the right as you add degrees of freedom; it stays around 1.We’re going to let the calculator do the work. All we have to do is put the ages in the three lists and then tell the calculator which lists to look at.The test we want is at the very bottom of the Stat Test submenu:Selecting ANOVA( takes you back to your home screen:Just enter the names of the lists you want analyzed, separated by commas:Press Enter, and you should get this:It’s the p we’re interested in, as usual. But let me explain Factor and Error: Factor is the variability between the groups, i.e., their x-bars, and Error is the variability within the groups, i.e., their standard deviations.The p-value is a lot bigger than our cut-off, the significance level α, which I set at 5%. So our claim that the populations (negative, neutral, positive) have different means is not supported. Our speculation about why they’re different was premature, because it turned out that differences like the ones we found would happen 37% of the time just by chance.Assignment1) You’re going to do one analysis-of-variance problem. Here’s the idea: books with different numbers of pages are going to have different thicknesses on average. You already showed this in Online Assignment #7, when you studied the correlation between the numbers of pages and the thickness of a book, and you used a regression equation to make predictions about thicknesses. Now we’re going to look at the connection a little differently. Find the five-number summary for the numbers of pages in your project – you know, the things at the bottom of 1-Var Stats: minX, Q1, Med, Q3, and maxX. These five numbers split the books into four groups: those whose numbers of pages are 1) between minX and Q1, 2) between Q1 and Med, 3) between Med and Q3, and 4) between Q3 and maxX. I would think that the average thickness of the four groups would not be the same (actually I think the averages would go from smallest to largest as you went from Group 1 to Group 4, but we’ll just settle for claiming that the average thicknesses for the four groups are not all the same). Now look at each book in Group 1, and put its thickness in L1. Put the thicknesses of books in Group 2 in L2, and so on.You’re testing the claim that the average thicknesses of the books in the four groups are not the same. a) Report the x’s you got for the four different groups.b) Go to Stat Test ANOVA, and enter the names of Lists 1 through 4, separated by commas. Report the p-value you got. If it’s less than 0.05 you can support your claim, otherwise not. If your p-value is less than 0.05, make a comment about why the means would not all be the same. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download