CSSS 508: Intro R
CSSS 508: Intro R
2/15/06
Lab 6: Plotting Practice
We’re going to use plotting to do some exploratory data analysis on a few datasets.
The Datasets:
Recall the R has many datasets in its MASS package.
> library(MASS)
We can use the help pages to list them.
First Example: Anorexia Data on Weight Change: anorexia
> dim(anorexia)
[1] 72 3
72 Subjects
Three variables: Treatment Group, Preweight, Postweight
Treatment Group: Cont (Control), CBT (Cognitive Behavioral Trt), FT (Family Trt)
> tr.group table(tr.group)
tr.group
CBT Cont FT
29 26 17
We can use scatterplots to look at the preweights vs. the postweights. If a person has gained weight, their postweight will be higher than their preweight. (i.e. they will be above the line y = x).
> pre.wt post.wt plot(pre.wt,post.wt)
> plot(pre.wt,post.wt,type="n")
> points(pre.wt[tr.group=="Cont"],post.wt[tr.group=="Cont"],col=2)
> points(pre.wt[tr.group=="CBT"],post.wt[tr.group=="CBT"],col=3)
> points(pre.wt[tr.group=="FT"],post.wt[tr.group=="FT"],col=4)
> abline(0,1)
(abline adds a line to the plot – here with an intercept of 0 and a slope of 1)
> title("Comparing Weights by Treatment Group")
We really want to study the difference in the two weights.
> wt.change summary(wt.change)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-12.200 -2.225 1.650 2.764 9.100 21.500
So what is the average weight change in each treatment group?
> mean(wt.change[tr.group=="Cont"])
[1] -0.45
> mean(wt.change[tr.group=="CBT"])
[1] 3.006897
> mean(wt.change[tr.group=="FT"])
[1] 7.264706
But the mean by itself doesn’t tell us the whole picture….
> boxplot(wt.change[tr.group=="Cont"],wt.change[tr.group=="CBT"],wt.change[tr.group=="FT"],names=c("Control","CBT","FT"))
> title("Wt Change by Treatment Group")
So the control group appears to be fairly centered at zero. The majority of the CBT group is in a similar range to half of the control group. The FT group is almost entirely above the other two treatment groups.
Second Example: Scottish Hill Races: hills
Record Times in 1984 for 35 Scottish hill races.
dist: distance in miles
climb: total height gained during the route in feet
time: record time in minutes
> dist climb time summary(dist)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 4.500 6.000 7.529 8.000 28.000
> summary(climb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
300 725 1000 1815 2200 7500
> summary(time)
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.95 28.00 39.75 57.88 68.63 204.60
> par(mfrow=c(1,3))
> boxplot(dist,ylab="Distance in Miles")
> boxplot(climb,ylab="Height Gained in Feet")
> boxplot(time,ylab="Record Time in Minutes")
The function pairs( ) plots scatterplots of all pairs of variables. We have three variables and so we have 3*2 pairs of variables to plot.
> pairs(hills)
The names of the variables are on the diagonal. Their adjacencies to the graphs indicate which variable is on the x-axis and which is on the y-axis.
We can look at these pairs to determine which relationship is more linear, which is curved, which has more spread, etc.
Making Your Own Plotting Functions:
If you have several plots to make or if you are working on plots for a project/paper and have to re-plot over and over to get what you want, you may want to write a for loop or a function to create your plots.
my.plots ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- intro for an argumentative essay
- intro to philosophy pdf
- intro to philosophy notes
- intro to ethics quizlet
- intro to finance pdf
- intro to business online textbook
- intro to finance textbook
- intro paragraphs for essays examples
- intro to philosophy textbook pdf
- short intro of myself
- intro to business
- intro to biology games