Homework 2: numpy matplotlib and the Unix/Linux command …

STATS700-002 Topics in Statistics: Data Analysis

1

Homework 2: numpy, matplotlib and the Unix/Linux command line

Due November 22, 11:59 pm Worth 20 points

November 6, 2017

Instructions on writing and submitting your homework. See the instructions posted on Canvas or on the course webpage at . umich.edu/~klevin/teaching/Fall2017/STATS700-002/hw_instructions.html.

1 Plotting: CLTs and random walks (15 points)

In this problem, we'll make some plots using matplotlib and pyplot.

1. Let random variables X1, X2, . . . , Xn be independently identically distributed with mean ? = EX1 and finite variance 2 = EX12 < , and denote their sample mean by Sn = (X1 + X2 + ? ? ? + Xn)/n. The classical central limit theorem states that as n , n(Sn - ?) converges in distribution (also called "in law") to a Gaussian random variable with mean 0 and variance 2. A naturalquestion to ask is whether the distribution of the Xi has an effect on how quickly n(Sn - ?) starts to look like a normal. Let's explore this phenomenon. Choose four different probability distributions with mean 0 and variance 1. For example, I might choose standard normal, (Bern(p) - p)/ p(1 - p), Poisson(1) - 1 and 12(Unif(0, 1) - 1/2) (you can check that all of these are mean 0 and variance 1), but feel free to choose more interesting or exotic distributions! See https: //docs.doc/scipy/reference/stats.html for a list of good choices.

(a) Pick a reasonably large value of n (say, n = 20). Make a plot with four subplots, one for each of the four distributions. For each subplot, use Numpy/Scipy to generate m = 1000 independent draws of n(Sn - ?) from that subplot's distribution andshow in that subplot a (normalized) histogram of the empirical distribution of n(Sn - ?).

(b) Title each of your subplots with the name of the distribution you used for that subplot. Make sure that all four of your plots have the same scales on their xand y-axes.

(c) Add to each subplot a solid line indicating the density of the standard normal, so that we can see how "normal" the empirical distribution looks (you will need to import the scipy.stats module for this: scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm).

STATS700-002 Topics in Statistics: Data Analysis

2

(d) Do all four of your plots look equally "normal"? Note that it may help to try a few different values of n to explore differences between the plots, depending on which four distributions you picked.

2. A random walk is a stochastic process that models a random path in which each

step in the path is chosen randomly. The simplest of these is a walk on the integers,

in which a walker starts at 0 and at each (discrete) time, the walker flips a coin

and moves "right" by one integer if it lands heads and "left" by one if it lands tails.

More formally, a W0 = 0 and Wn

random walk is a sequence of

=

n i=1

Zi,

where

Z1, Z2, . . .

random , Zn, . . .

variables are i.i.d.

{Wn} n=0 random

in which variables

with Pr[Z1 = 1] = p and Pr[Z1 = -1] = 1 - p. It should be intuitively clear that if

p = 1/2, the walker will vascillate around 0 forever, but if p = 1/2, then the walker

will eventually wander off to + or -, according to whether p > 1/2 or p < 1/2.

But how fast does this wandering happen? To get a very rough idea, let's compare

a bunch of different walks in a plot.

(a) Write a function generate random walk that takes as arguments a Bernoulli success rate parameter p and a number of steps n 0 and returns a Numpy array of length n + 1 whose t-th entry corresponds to Wt, i.e., the location of the random walker at time t. Note: by definition, the walk should start at 0, so if your function returns array w, then it must be the case that w[0] == 0.

(b) Make a plot that shows the trajectories of five different independent random walks generated from your function, with success parameters

p {0.4, 0.45, 0.5, 0.55, 0.6}

and all having the same length parameter n = 200. Plot each of these five walks in a different color, and include a legend in your plot that indicates which color corresponds to which value of the parameter. Please include an appropriate title for your plot.

2 Using the Command Line (5 points)

For this problem, you'll need to sign on to the Fladoop cluster using either ssh (on UNIX/Linux/MaxOS) or cygwin/PuTTY (on Windows). If you do not have a Fladoop account, you must request one (see Canvas for instructions or contact the instructor or your GSI). You do not have to write any Python code for this problem, but please copypaste your shell commands and their outputs into your jupyter notebook. The cleanest way to do this is to copy-paste the contents of your shell (commands and the outputs) into jupyter "Raw NBConvert" cells. This will yield cells with monospace fonts, i.e., text that looks reasonably like code, and is differentiated from the non-code text in your notebook.

1. Using ssh/cygwin/PuTTY, sign on to the Fladoop cluster. This will require you to set up Duo two-factor authentication, if you have not done so already. What is the name of the machine you are signed on to? What directory are you placed in upon signing in?

2. List the contents of your home directory. Is there anything there?

3. Create a directory called "STATS700-002".

STATS700-002 Topics in Statistics: Data Analysis

3

4. Change directory to move into the directory you just created and list its contents. Is there anything there?

5. In the directory STATS700-002, create a file called fox.txt containing only the string the quick brown fox jumps over the lazy dog.

6. List the contents of the current directory again. Your new file, fox.txt should appear.

7. Print the contents of fox.txt to the shell.

8. Append the string What does the fox say? to the file fox.txt.

9. Print the contents of fox.txt to the shell to show that you have successfully appended the string.

10. How big is the file fox.txt? Who is the owner? When was it created? Hint: Read the man page for the command ls, in particular the -l flag.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download