CSSS 508: Intro R

CSSS 508: Intro R


Lab 6: Plotting Practice

We’re going to use plotting to do some exploratory data analysis on a few datasets.

The Datasets:

Recall the R has many datasets in its MASS package.

> library(MASS)

We can use the help pages to list them.

First Example: Anorexia Data on Weight Change: anorexia

> dim(anorexia)

[1] 72 3

72 Subjects

Three variables: Treatment Group, Preweight, Postweight

Treatment Group: Cont (Control), CBT (Cognitive Behavioral Trt), FT (Family Trt)

> tr.group table(tr.group)



29 26 17

We can use scatterplots to look at the preweights vs. the postweights. If a person has gained weight, their postweight will be higher than their preweight. (i.e. they will be above the line y = x).

> pre.wt post.wt plot(pre.wt,post.wt)

> plot(pre.wt,post.wt,type="n")

> points(pre.wt[tr.group=="Cont"],post.wt[tr.group=="Cont"],col=2)

> points(pre.wt[tr.group=="CBT"],post.wt[tr.group=="CBT"],col=3)

> points(pre.wt[tr.group=="FT"],post.wt[tr.group=="FT"],col=4)

> abline(0,1)

(abline adds a line to the plot – here with an intercept of 0 and a slope of 1)

> title("Comparing Weights by Treatment Group")

We really want to study the difference in the two weights.

> wt.change summary(wt.change)

Min. 1st Qu. Median Mean 3rd Qu. Max.

-12.200 -2.225 1.650 2.764 9.100 21.500

So what is the average weight change in each treatment group?

> mean(wt.change[tr.group=="Cont"])

[1] -0.45

> mean(wt.change[tr.group=="CBT"])

[1] 3.006897

> mean(wt.change[tr.group=="FT"])

[1] 7.264706

But the mean by itself doesn’t tell us the whole picture….

> boxplot(wt.change[tr.group=="Cont"],wt.change[tr.group=="CBT"],wt.change[tr.group=="FT"],names=c("Control","CBT","FT"))

> title("Wt Change by Treatment Group")

So the control group appears to be fairly centered at zero. The majority of the CBT group is in a similar range to half of the control group. The FT group is almost entirely above the other two treatment groups.

Second Example: Scottish Hill Races: hills

Record Times in 1984 for 35 Scottish hill races.

dist: distance in miles

climb: total height gained during the route in feet

time: record time in minutes

> dist climb time summary(dist)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.000 4.500 6.000 7.529 8.000 28.000

> summary(climb)

Min. 1st Qu. Median Mean 3rd Qu. Max.

300 725 1000 1815 2200 7500

> summary(time)

Min. 1st Qu. Median Mean 3rd Qu. Max.

15.95 28.00 39.75 57.88 68.63 204.60

> par(mfrow=c(1,3))

> boxplot(dist,ylab="Distance in Miles")

> boxplot(climb,ylab="Height Gained in Feet")

> boxplot(time,ylab="Record Time in Minutes")

The function pairs( ) plots scatterplots of all pairs of variables. We have three variables and so we have 3*2 pairs of variables to plot.

> pairs(hills)

The names of the variables are on the diagonal. Their adjacencies to the graphs indicate which variable is on the x-axis and which is on the y-axis.

We can look at these pairs to determine which relationship is more linear, which is curved, which has more spread, etc.

Making Your Own Plotting Functions:

If you have several plots to make or if you are working on plots for a project/paper and have to re-plot over and over to get what you want, you may want to write a for loop or a function to create your plots.

my.plots ................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download