ISQA 521 – Spring 2021 - Portland State University



ISQA 521 – Spring 2021Homework Week 3Short Answer QuestionsWhat is a run chart? Define a stable process. What does the run chart of a stable process look like? What is the best way to assess stability? Distinguish between a run chart and a time series plot. What are the two types of visualizations for displaying multiple time series?How does the concept of a serial number relate to store dates in Excel and R? What is a mapping projection? What is its purpose?What role does a shapefile provide when creating a map on the computer?What is the role of a simple features data set in computer map making?What ggplot2 function provides for specifying the projection underlying a specific map?What is the distinction between a raster image map and a vector image map?Maps require geocodes. What are they?What ggplot2 function provides for plotting a simple features data set? How many such function calls are needed to create a specific map?What is a choropleth map?Worked Problems1. Plot Time Oriented Data: Run ChartAn example is an assembly line process that fills cereal boxes. To calibrate the machine that releases the cereal into the box, record the weight in grams for 50 consecutive cereal boxes. The target value, the stated filled weight in grams on the cereal box, is 350 grams.The run chart provides insight into the dynamics of an ongoing process. Because of the random variation inherent in every process outcome, the run chart displays these random fluctuations, including those data values that represent a stable process. The visualization of the output of a stable process centers on a horizontal line with random variations about that center line with a constant level of variation. One purpose of the run chart is to evaluate the stability of the process. The data: . Construct the run chart of the variable Weight.b. Does the production process appear stable (constant mean and variability)?2. Plot Time Oriented Data: Time SeriesThe following data set updates deaths and other statistic due to COVID-19 on a daily basis. The data: is a 4.1MB file so you may prefer to download the file to your computer instead of read from the web each time you re-run your analysis. To download,download.file("", "covid.csv")The second parameter value is the destination file, which will be the current working directory of your R session. Get the location of your current working directory from getwd(). Put a path in front of the file name to direct somewhere else.a. As always, examine the data. i) How many rows of data in the full data set?ii) Look at the first ten rows. From the output of Read(), we will only focus on a subset of variables. Read() also provides the column indices to identify the variables.ggplot2 wants only data in long format. Is the data in wide format or long format? Do we need to transform the data for ggplot2? b. Subset the data table to include only data for USA, Italy, and Brazil. The corresponding two-character country codes, the variable geoId, are US, BR and IT. Just retain the variables in columns 1, 5, 6, and 8 through 10. Since this is not a course in data manipulation, the R subsetting code is provided:d <- d[.(geoId %in% c("US","BR","IT")), .(1,5,6,8,10)]c. After the extraction (sub-setting), examine the first six rows, and the total number of rows. Describe.d. Convert the character string date to an R variable type Date.e. Invoke the base R range() function on d$dateRep to verify the range of dates. What are the beginning and ending dates? f. Plot the time series of deaths just for the USA either lessR and ggplot2. First form a separate data frame, sub-setting again, but leave d alone as we will use that also.Interpret the time series.g. Get a Trellis plot for the time series of death for USA, Italy, and Brazil. Interpret and compare deaths across the three countries.3. Country Map Display a map of Great Britain that includes all cities larger than 500,000 population with the population mapped into the size of the corresponding plotted point.The following package installations are needed to run these analyses.install.packages("rnaturalearthhires", repos="", type="binary")install.packages(c("ggrepl", "sf"))Technical note: We do some subsetting of a data frame in this assignment, that is, dropping rows and/or columns. The means to do this is the base R function Extract, called by d[rows, cols]along with the lessR dot function .() for a data frame named d. Here rows and cols are expressions that indicate the rows and columns to extract. Learn more from the vignette called Subset a Data Frame, available from entering into R: browseVignettes("lessR")Read the data set of the world's 15000 largest cities from Geonames. Display the output of Read() and the first several lines of the data frame.Subset the data frame to just cities within Great Britain with the following variables: name, longitude, latitude, population, elevation. Subset the rows of the data frame just to cities with a population larger than 500,000.Read the map data of Great Britain from the natural earth hi-resolution data source into a simple features data bine the map data with the city data into a single simple features data frame. Show the first several lines of the data frame.Use ggplot2 to plot three separate layers: The Great Britain map, the cities with corresponding sized points, and the city names.4. Choropleth MapPlot a map of the states of the USA, which the fill color of the state depending on the number of deaths per million of COVID-19. The darker the color, the more the death rate.Illustrate the map() function from package maps to show a map of the states of the USA. [Drop the plot parameter and the fill parameter from the use of map() toward the end of Sec 8.1.5 to create the mapping data frame. No need to convert to an sf data frame for this use.]Convert the map data to a simple features data frame called states. Do not plot the data. Display the first several lines of the transformed data and describe.Read the COVID-19 data for the USA into a data frame called covid. Show the first several lines of the data.Data source: states data lists the states in lower case. The covid data file lists the states in title format, that is, uppercase first letter. To merge on the values of a common variable, set the state names in the states data frame to lower case. Examine the first several lines of the transformed data file to verify the transformation. Did it work?Add the covid data to the sf states data file. The Data Wrangling example toward the end of Section 8.1.5 uses the function inner_join.sf() from the sf package to do the merge. The difference in this example is that the variables for which to merge the two data sets have different names, ID and State, though identical data values for the merge. One option is to change the name in one of the variables to be the same as the other variable name. Or, merge on the common data values with the two names in the two data frames.This is not a course in data manipulation, so here is how to do this merge with the base R function merge().states = merge(states, covid, by.x="ID", by.y="State")List the first several lines of this merged data set. Describe.Plot the choropleth map of COVID death rates by USA state. The variable the indicates the deaths per million is Deaths1M. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download