06 - Intro to graphics (with ggplot2) (part 2)
06 - Intro to graphics (with ggplot2) (part 2)
ST 597 | Spring 2017 University of Alabama
06-dataviz2.pdf
Contents
1 Cleveland Dot Plot
2
1.1 Your Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Cleveland Dot Plot Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Line Graphs
4
2.1 economics data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Your Turn: Stock Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Estimating Distributions
7
3.1 Discrete Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Histograms
9
4.1 geom_hist() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Kernel Density Estimation
11
5.1 geom_density() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Some Useful Settings: geom_histogram, geom_density . . . . . . . . . 16
6 Boxplot and Violin Plot
17
6.1 Boxplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Violin Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 Discretizing continuous variables for boxplot . . . . . . . . . . . . . . . . . . 19
6.4 Sequentile Quantiles (Fanchart, Fanplot) . . . . . . . . . . . . . . . . . . . . . 20
6.5 Your Turn: Old Faithful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Required Packages and Data
library(tidyverse) library(gcookbook) library(Lahman) # may need to: install.packages("Lahman")
1
1 Cleveland Dot Plot
William Cleveland wrote a popular book on visualizing data The Elements of Graphing Data that has many useful suggestions. One element he stressed was to reduce the cognitive strain on the view. One way to do this is to use as little ink as possible. The Cleveland dot plot contains the same information as a bar graph, but instead of using all the ink needed for the bar, remove the bar altogether and place a dot at the bar height (using geom_point()).
Consider the baseball data Batting from the Lahman package. We want to display the number of home runs for each team during the 2014 season.
library(Lahman)
# load the Lahman package
data(Batting)
# load the Batting data
H = Batting %>%
filter(yearID == 2014) %>%
group_by(teamID) %>%
summarize(teamHR = sum(HR), teamBA=sum(H)/sum(AB), teamR = sum(R))
glimpse(H)
#> Observations: 30
#> Variables: 4
#> $ teamID ARI, ATL, BAL, BOS, CHA, CHN, CIN, CLE, COL, DET, HOU,...
#> $ teamHR 118, 123, 211, 123, 155, 157, 131, 142, 186, 155, 163, ...
#> $ teamBA 0.2484, 0.2407, 0.2563, 0.2441, 0.2526, 0.2387, 0.2376,...
#> $ teamR 615, 573, 705, 634, 660, 614, 595, 669, 755, 757, 629, ...
I added team batting average (teamBA) and runs (teamR) to dress up our plot.
Compare the bar graph with the dot plot. #- (left) bar graph ggplot(H) + geom_col(aes(x=reorder(teamID, -teamHR), y=teamHR))
#- (right) corresponding dot plot ggplot(H) + geom_point(aes(x=reorder(teamID, -teamHR), y=teamHR))
210 200
150
180
teamHR teamHR
100
150
50
120
0
BALCOLTORHOUCHNPITCHADETLAAWASMILNYAOAKCLESEALANSFNCINMINNYNPHIATLBOSMIAARITBATEXSDNSLNKCA reorder(teamID, -teamHR)
90 BALCOLTORHOUCHNPITCHADETLAAWASMILNYAOAKCLESEALANSFNCINMINNYNPHIATLBOSMIAARITBATEXSDNSLNKCA
reorder(teamID, -teamHR)
2
1.1 Your Turn
Your Turn #1 : Dot Plot vs. Bar Plot 1. What are the differences between the two plots? 2. What aspects can be improved with the dot plot?
1.2 Cleveland Dot Plot Aesthetics
The real strength is in adding additional aesthetics, like size and color ggplot(H) + geom_point(aes(x=reorder(teamID, -teamHR), y=teamHR,
size=teamR, color=teamBA>.260))
210
teamHR
teamBA > 0.26
180
FALSE
TRUE
teamR
150
550
600
650
700
120
750
90 BACLOTLOHROCUHNPICTHDAELTAWAAMSINL YOAACKLSEELAASNFNCINMINYNPHAITBLOMSIAARTIBTAESXDSNLKNCA
reorder(teamID, -teamHR)
Final touch include putting team on the y-axis, changing the theme, and adding a title
#- new theme dot_theme = theme_bw() +
theme(panel.grid.major.x=element_blank(), panel.grid.minor.x=element_blank(), panel.grid.major.y=element_line(color="grey60", linetype="dotted"))
#- Cleveland dot plot ggplot(H) +
geom_point(aes(x=teamHR, y=reorder(teamID, teamHR), size=teamR, color=teamBA>.260)) +
dot_theme + labs(title = "2014 MLB", x="home runs", y="team") + scale_color_manual(name="BA>.260", values=c("blue", "orange")) + scale_size(name="Runs", range=c(1,6))
3
team
2014 MLB
BAL COL TOR HOU CHN PIT LAA DET CHA WAS MIL NYA OAK CLE SEA LAN SFN CIN MIN PHI NYN BOS ATL MIA ARI TBA TEX SDN SLN KCA
90
120
150
180
home runs
BA>.260
FALSE TRUE
Runs
550 600 650 700 750
210
The Cleveland Dot Plot is an alternative to a bar plot. There is also a dot plot (geom_dotplot()) that is an alternative to a histogram.
2 Line Graphs
2.1 economics data
The economics data from the ggplot2 package contains some economic time series data
library(dplyr)
library(ggplot2)
data(economics)
glimpse(economics)
#> Observations: 574
#> Variables: 6
#> $ date
1967-07-01, 1967-08-01, 1967-09-01, 1967-10-01, 1967...
#> $ pce
507.4, 510.5, 516.3, 512.9, 518.1, 525.8, 531.5, 534....
#> $ pop
198712, 198911, 199113, 199311, 199498, 199657, 19980...
#> $ psavert 12.5, 12.5, 11.7, 12.5, 12.5, 12.1, 11.7, 12.2, 11.6,...
#> $ uempmed 4.5, 4.7, 4.6, 4.9, 4.7, 4.8, 5.1, 4.5, 4.1, 4.6, 4.4...
#> $ unemploy 2944, 2945, 2958, 3143, 3066, 3018, 2878, 3001, 2877,...
We can plot the number of unemployed over time with a line plot (using geom_line())
4
ggplot(economics, aes(date, unemploy)) + geom_line()
12000
unemploy
8000
4000
1970
1980
1990
date
2000
2010
ggplot recognizes the date class and smartly adds yearly tick marks.
We can fancy it up, maybe add some points ggplot(economics, aes(date, unemploy)) +
geom_line(size=2, color="orange") + geom_point(shape=21, color='blue', fill='white', size= 1)
12000
unemploy
8000
4000
1970
1980
1990
date
2000
2010
We can shade the region under the line with geom_area()
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- data display in r for repeated measurements
- data visualization and graphics in r
- data visualization with ggplot2 cheat sheet
- a ggplot2 primer data action lab
- introduction to ggplot2
- chapter 2 r ggplot2 examples university of wisconsin
- lecture 3 bar graphs
- univariate graphing wesleyan university
- 6 r for graphs 2020
- data visualization stats and r
Related searches
- tfm volume 1 part 2 chapter 4700
- riddle part 2 crossword clue
- intro to psychology chapter 2 quiz
- ielts speaking part 2 pdf
- ielts speaking part 2 and 3 questions
- ielts speaking part 2 education
- ielts speaking part 2 art
- ielts speaking part 2 3
- ielts speaking part 2 structure
- ielts speaking part 2 answers
- ielts writing part 2 questions
- after part 2 release date