Cudo.carleton.ca



### This script goes with the Visualization in R workshop# delivered on Wed, Nov 28, 2018 and Fri, Nov 30, 2018# Designer and instructor: Mladen Rakovic#### Module 1: Getting Ready ####install.packages("ggplot2")# In case you need to install from binaries, uncomment and run the following line#install.packages("ggplot2", dependencies=TRUE, type="binary)library (ggplot2)data(package="ggplot2") # get a list of datasets emmeded in ggplot2mpg # print out the dataset for initial inspectionView(mpg) # if you want it nicely presented in a new R Studio tab# Three key components of ggplot2 grammar: data, aesthetics, and geom#### Problem 1: Can you guess what the following plots will look like ####ggplot(midwest, aes(x=area, y=poptotal)) + geom_point()ggplot(economics, aes(x=date, y=unemploy)) + geom_line()ggplot(mpg, aes(x=cty)) + geom_histogram()#### Module 2: Scatterplot ##### Create a scatterplot showing a correlation between engine size and fuel economyggplot(mpg, aes(x = displ, y = hwy)) + geom_point() # add a layer on top of plot# Geom is a graphical representation of the data in the plot (point, line, bar...)#### Lets add a linear fit to the plot#### Are there any outliers?ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + geom_smooth(method = "lm", se=FALSE) + # se is a confidence interval; turn it of bby setting to FALSE xlim(c(1.5, 6)) + ylim(c(10, 40))# Draw the line of best fit#### Problem 2: Create a scatterplot from midwest dataset (area vs population), draw a linear fit without confidence interval, and identify and remove outliers ##### plot_midwest <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method="lm", se=FALSE) + xlim(c(0, 0.1)) + ylim(c(0, 1000000)) # options(scipen=999) # in case scientific notation bothers you, you can turn it off#### Module 3: Colour, size, shape ####ggplot(mpg, aes(x=displ, y=cty, colour = class)) + geom_point(size=3) + facet_wrap(~class) # using facets could sometimes be helpful in interpreting data#### We can use shape instead of solour to distinguish among classes ####ggplot(mpg, aes(x=displ, y=cty, shape = class)) + geom_point(size=3)#### Problem 3.1: Create a scatterplot from midwest dataset (area vs population), add linear fit, and colour each data point (i.e., county) according to its state. #### midwest_scatter <- ggplot(midwest, aes(x=area, y=poptotal, colour= state)) + geom_point()# save for future use#### Problem 3.2: Provide facets from the same plot, but without colours.ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method = "lm", se=FALSE) + facet_wrap(~state)#### We can use plot_midwest with removed outliers and add facet_wrap on top of thatplot_midwest + facet_wrap(~state)#### How could we interpret out dataset now?#### Module 4: Boxplots and Jittered Points ##### The relationship between continuous variable and levels of categorical variable# Try with scatterplot firstggplot(mpg, aes(x=class, y=cty)) + geom_point()# Jittering adds a usefull noise to the data to make the plot readableggplot(mpg, aes(x=class, y=cty)) + geom_jitter()# Boxplot provides information about the shape of distributionggplot(mpg, aes(x=class, y=cty)) + geom_boxplot()# Violin plot provides density distributionggplot(mpg, aes(x=class, y=cty)) + geom_violin()#### Problem 4: Which of the three types of plots are most informative when plotting state vs area in the midwest dataset? Please plot and compare the results.ggplot(midwest, aes(x=state, y=area)) + geom_jitter()ggplot(midwest, aes(x=state, y=area)) + geom_boxplot()ggplot(midwest, aes(x=state, y=area)) + geom_violin()#### Module 5: Histograms and Frequency Polygons ####ggplot(mpg, aes(x=cty)) + geom_histogram(binwidth = 1)ggplot(mpg, aes(cty)) + geom_freqpoly()# Playing with binwidth parameter may help us get more informative histograms or frequency polygons#### Problem 5: Create a histogram that shows a distribution in total pupulation across counties. Tund binwidth parameter to get an effective plot.ggplot(midwest, aes(poptotal)) + geom_histogram(binwidth=15)#### Module 6: Bar Chart ####ggplot(mpg, aes(x=manufacturer)) + geom_bar()#### Problem 6: Create a barchart that shows counts of counties in each state ####ggplot(midwest, aes(state)) + geom_bar()#### Module 7: Time Series with Line and Path Plots ##### economics dataset# Line plotsggplot(economics, aes(x=date, y=unemploy / pop)) + geom_line(colour="red")# Path plotsggplot(economics, aes(unemploy / pop, uempmed)) + geom_path() + geom_point()#### What is the most effective way to communicate change in population over between 1967 and 2014? Please create a plot.ggplot(economics, aes(date, pop)) + geom_line(colour="green")#### Module 8: Adding plot, axis and legend labels ####plot_midwest <- ggplot(midwest, aes(x=area, y=poptotal, colour=state)) + geom_point()plot_midwest + labs(title = "Total Population Versus Area of the County", subtitle = "Country: USA", caption = "Reported in Smith et al. (2018)", x = "County Area", y = "Total Population", colour = "State")#### References ##### Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.# ?ggplot2 in R console - you will be given a help page in the Helpful Tools panel# use Google ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download