STAT 1261/2260: Principles of Data Science



STAT 1261/2260: Principles of Data ScienceLecture 5 - ggplot2 (2/3): In-Class ExercisesExercise 1: Father’s Height vs.?Children’s heightUse the famous Galton data set from the mosaic package.Load the mosaic package and check the data set using head function.library(mosaic)head(Galton)## family father mother sex height nkids## 1 1 78.5 67.0 M 73.2 4## 2 1 78.5 67.0 F 69.2 4## 3 1 78.5 67.0 F 69.0 4## 4 1 78.5 67.0 F 69.0 4## 5 2 75.5 66.5 M 73.5 4## 6 2 75.5 66.5 M 72.5 4Exercise 1: Father’s Height vs.?Children’s height (cont.)Add context to the plot created in Part 4 of the exercise in Lecture 4.Add title “Father’s Height vs.?Children’s Height” to the plot in Part 4.Change the x label to Father's Height (inch) and the y label to Child's Height (inch)ggplot(data=Galton,mapping=aes(x=father,y=height))+ geom_point(aes(color=sex))+ geom_smooth(method="lm",aes(color=sex))+ labs(title="Father's Height vs. Children's Height",x="Father's Height (inch)", y="Child's Height (inch)")Exercise 2: Override the default yg<- ggplot(data=HELPrct, aes(x=substance,fill=substance))g+geom_bar()Show the proportion for each category.Exercise 2: Override the default y (cont.)Alternative way:g+geom_bar(aes(y=..prop..,group=1))Exercise 2: Override the default y (cont.)Notice that the plot becomes grey even though fill=substance is specified in ggplot(). Why? How to solve this problem?g+geom_bar(aes(y=..count../sum(..count..)))Exercise 3: Improve the scatterplotWhat is the problem with this plot? How could you improve it?ggplot(data=mpg,mapping=aes(x=cty,y=hwy))+ geom_point()Exercise 3: Improve the scatterplot (cont.)g<-ggplot(data=mpg,mapping=aes(x=cty,y=hwy))g+geom_point(position="jitter")g+geom_jitter() ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download