Problem Set I



Problem Set I

Before you begin, make sure that you have loaded the ggplot2 package and the MASS package. Also, install, using the install.packages command, the package ggthemes. Load ggthemes using the library or require command.

1. Open the built-in data frame called txhousing. Give five different ways of showing how many rows are in the data frame.

dim(txhousing)

## [1] 8602 9

nrow(txhousing)

## [1] 8602

str(txhousing)

## Classes 'tbl_df', 'tbl' and 'data.frame': 8602 obs. of 9 variables:

## $ city : chr "Abilene" "Abilene" "Abilene" "Abilene" ...

## $ year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...

## $ month : int 1 2 3 4 5 6 7 8 9 10 ...

## $ sales : num 72 98 130 98 141 156 152 131 104 101 ...

## $ volume : num 5380000 6505000 9285000 9730000 10590000 ...

## $ median : num 71400 58700 58100 68600 67300 66900 73500 75000 64500 59300 ...

## $ listings : num 701 746 784 785 794 780 742 765 771 764 ...

## $ inventory: num 6.3 6.6 6.8 6.9 6.8 6.6 6.2 6.4 6.5 6.6 ...

## $ date : num 2000 2000 2000 2000 2000 ...

View(txhousing)

#Check out the number of observations in the Global Environment window

2. Using ggplot, make a histogram of the listings variable. Use your ggplot skills to make your plot aesthetically pleasing.

ggplot(txhousing,aes(listings)) + geom_histogram(fill="goldenrod") + theme_bw()+

xlab("Listings") + ylab("Frequency")

[pic]

You will see that the distribution of listings is skewed to the right. To see the relationships among the more common numbers of listing at the left side of the distribution, you can use the xlim() command to restrict the length of the x-axis. Try the following term in your ggplot code: + xlim(0,10000).

ggplot(txhousing,aes(listings)) + geom_histogram(fill="goldenrod") + theme_bw()+

xlab("Listings") + ylab("Frequency") + xlim(0,10000)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 1917 rows containing non-finite values (stat_bin).

## Warning: Removed 2 rows containing missing values (geom_bar).

[pic]

3. Determine the standard deviation of the sales variable and assign it to an R object. Create another R object and assign it the value of the mean of the inventory variable.

sales.sd ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download