STAT 1261/2260: Principles of Data Science



STAT 1261/2260: Principles of Data ScienceLecture 8 - Data Wrangling: One Table (2/2): In-Class ExercisesExercise 1: Sorting a data setSort the rows by term length (ascending) and party by revising the previous code.library(tidyverse)## Warning: As of rlang 0.4.0, dplyr must be at least version 0.8.0.## x dplyr 0.7.6 is too old for rlang 0.4.2.## b9 Please update dplyr to the latest version.## b9 Updating packages on Windows requires precautions:## < Warning: package 'purrr' was built under R version 3.5.3library(lubridate)mypresidents <- presidential %>% mutate(term_length = (end - start)/dyears(1))## Warning: The `printer` argument is deprecated as of rlang 0.3.0.## This warning is displayed once per session.mypresidents %>% arrange(desc(term_length), start) mypresidents %>% arrange(term_length, party)## # A tibble: 11 x 5## name start end party term_length## <chr> <date> <date> <chr> <dbl>## 1 Ford 1974-08-09 1977-01-20 Republican 2.45## 2 Kennedy 1961-01-20 1963-11-22 Democratic 2.84## 3 Carter 1977-01-20 1981-01-20 Democratic 4.00## 4 Bush 1989-01-20 1993-01-20 Republican 4.00## 5 Johnson 1963-11-22 1969-01-20 Democratic 5.17## 6 Nixon 1969-01-20 1974-08-09 Republican 5.55## 7 Clinton 1993-01-20 2001-01-20 Democratic 8.01## 8 Obama 2009-01-20 2017-01-20 Democratic 8.01## 9 Eisenhower 1953-01-20 1961-01-20 Republican 8.01## 10 Reagan 1981-01-20 1989-01-20 Republican 8.01## 11 Bush 2001-01-20 2009-01-20 Republican 8.01Exersice 2Revise the following code to compare the median term length of the two parties.mypresidents %>% group_by(party) %>% summarize( N = n(), avg_term_length = mean(term_length), std_term_length = sd(term_length) )## # A tibble: 2 x 4## party N avg_term_length std_term_length## <chr> <int> <dbl> <dbl>## 1 Democratic 5 5.60 2.34## 2 Republican 6 6.00 2.40mypresidents %>% group_by(party) %>% summarize( N = n(), med_term_length = median(term_length) )## # A tibble: 2 x 3## party N med_term_length## <chr> <int> <dbl>## 1 Democratic 5 5.17## 2 Republican 6 6.78Exercise 3Which month has the longest departure delay? Revise the code to solve it.library(nycflights13) ArrDelay<- flights %>% select(year,month,day,arr_delay,dest) %>% filter(month == 12 & day == 25 & arr_delay>0) %>% group_by(dest) %>% summarize(AvgArrDelay = mean(arr_delay,na.rm=TRUE)) %>% arrange(desc(AvgArrDelay)) print(ArrDelay,n=10)library(nycflights13) DepDelay<- flights %>% # Create a new dataset select(month,dep_delay) %>% # select variables of interest filter(dep_delay>0) %>% # pick flights on 1/1/2013 group_by(month) %>% # group the data by carrier summarize(AvgDepDelay = mean(dep_delay,na.rm=TRUE)) %>% # ignore missing values arrange(desc(AvgDepDelay)) # order the summary dataset by average departure delay## Warning: `lang()` is deprecated as of rlang 0.2.0.## Please use `call2()` instead.## This warning is displayed once per session.## Warning: `new_overscope()` is deprecated as of rlang 0.2.0.## Please use `new_data_mask()` instead.## This warning is displayed once per session.## Warning: `overscope_eval_next()` is deprecated as of rlang 0.2.0.## Please use `eval_tidy()` with a data mask instead.## This warning is displayed once per session.print(DepDelay)## # A tibble: 12 x 2## month AvgDepDelay## <int> <dbl>## 1 6 49.8## 2 7 48.8## 3 4 44.2## 4 3 39.6## 5 5 39.2## 6 8 37.3## 7 12 37.2## 8 9 35.7## 9 1 35.3## 10 2 35.3## 11 10 31.6## 12 11 28.7 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download