Course1.winona.edu



DSCI 210: Data Science – Python Take-homeName: ____________________________(2 pts)Spring 2015Points: 60A Note about Appropriate Level of Assistance from Your PeersMy goal is to have this exam be a learning experience. You will learn by completing this exam. You can and should certain seek assistance from your peers. However, copying their work is not appropriate. Much of the code has been provided to you on this exam; thus, checking your code is not necessary. This exam is about getting things done in Python. When providing screen captures of your work for this exam, include the In[ ] and Out[ ] lines as is shown here. This will provide some assurance that you’ve correctly executed the code in Python.Consider the problem of counting blobs in a picture. Counting blobs is considered to be within the notation of computer vision. The Wiki page for computer vision is provided here.Wiki: Blob1Blob2Blob3Blob4Download each blob from our course website onto your local machine. Load the following package and subpackages. If you are running Python 2.x, filter should be changed to filters on the second line.import numpy as npfrom skimage import io, color, filterfrom scipy import ndimagefrom skimage.measure import labelRun the following code to use the io package to read in the Blob1.jpg image. The location will need to change depending on where you saved the Blob1.jpg image.image = io.imread('C:/DSCI210/Blob1.jpg')Plot the image inside python using the following command.io.imshow(image)Turn-in: A screen capture of the image being plotted inside python. (2 pts)Next, let’s convert the image to grey-scale using the followingimage = color.rgb2gray(image)Turn-in: A screen capture of the grey-scale image being plotted inside python. (2 pts)The next step is to apply Otsu’s method to the image as follows. The updated image will contain pixels less than a specified threshold value. image = image < filter.threshold_otsu(image)Turn-in: A screen capture of the image after the Otsu’s method is applied. (1 pt)A short discussion of what Otsu’s algorithm did to our image. (2 pts)Wiki Page on Otsu’s Method: Finally, use the following code to count the number of blobs in this image. distance = ndimage.distance_transform_edt(image)blob_centers = (distance > 0.80 * distance.max())io.imshow(blob_centers)print('The number of blobs in this image is', np.max(label(blob_centers)))Turn-in: A screen capture of the blob_centers image inside python and the screen capture of outcome from the print() statement. (2 pts)For this problem, use the Blob2.jpg image. Rerun all the code from Problem #1 using the Blob2.jpg image.Turn-in: A screen capture of the blob_centers image for Blob2.jpg from inside python and the screen capture of outcome from the print() statement. (3 pts)A review of the blob_centers image reveals that one blob was not counted in our algorithm. Change the following line of code and then recreate the blob_centers image.Changeblob_centers = (distance > 0.80 * distance.max())to blob_centers = (distance > 0.50 * distance.max())Turn-in: A screen capture of the updated blob_centers image for Blob2.jpg from inside python and the screen capture of outcome from the print() statement. (2 pts)What effect did this change have on the blob_centers image? How about on the count of the number of blobs? Discuss briefly (2 pts)For this problem, we will consider Blob3. This image has one discolored blob which will prove difficult to count.Rerun all the code from Problem #1 with the following change using the Blob3.jpg image.Changeblob_centers = (distance > 0.80 * distance.max())to blob_centers = (distance > 0.50 * distance.max())Turn-in: A screen capture of the updated blob_centers image for Blob3.jpg from inside python and the screen capture of outcome from the print() statement. (2 pts)The discolored blob is not counted. Let’s consider reducing the threshold value even more in determining blob_centers in an effort to this this discolored blob counted. Changeblob_centers = (distance > 0.50 * distance.max())to blob_centers = (distance > 0.10 * distance.max())Turn-in: A screen capture of the output from the print() statement using 0.10 as the threshold value. (2 pts)Consider the following code in which a sequence from 0.01 to 0.10 by 0.01 is created. Then this sequence is used in a for loop to cycle through these threshold values in an attempt to count the discolored blob. Run this code in python.A table for understanding for loop.Value of ithreshold_valueCode for blob_centersi = 0threshold_value[0]=0.01blob_centers = (distance > 0.01 * distance.max())i = 1threshold_value[1]=0.02blob_centers = (distance > 0.02 * distance.max()):::i=9threshold_value[9]=0.10blob_centers = (distance > 0.10 * distance.max())Turn-in: The outcomes from the print() statement from the for loop. (2 pts)A plot of the updated blob_centers image when using a threshold value of 0.01. (2 pts) Provide an explanation as to why the number of blobs is so high when using the threshold value of 0.01. (2 pts)For this last problem, we will consider the effect of the shape on counting blobs within this algorithm. Use the Blob4.jpg image for this problem. Run the code used from Problem #1 with the modification to the threshold value akin to what was done in Problem #3, part c.The correct number of blobs is 8, but unfortunately our algorithm produces different counts depending on the threshold value.Turn-in: Create the blob_centers image for threshold_value = 0.25. Did our algorithm correctly identify each blob? Discuss briefly. (2 pts)Create the blob_centers image for threshold value = 0.40. What happened to cause our algorithm to count 9 blobs? Discuss briefly. (2 pts) Create the blob_centers image for threshold value = 0.45. How did our algorithm arrive at 7 blobs when counting? Discuss briefly. (2 pts)We considered a simple and quick algorithm for counting the number of blobs in an image in problems #1 - #4. Write a one paragraph summary about the methodology used in this algorithm and its performance. This summary should be written for your manager who does not understand anything about computer vision and the issues associated with counting blobs in images. Turn-in: Your one paragraph summary of this algorithm. (4 pts) For this problem, we will be scraping data US QuickFacts data from the following webpage.Link: Use the following code to scrape the US QuickFacts data from the web address specified above.Once the data has been successfully placed into data, a local DataFrame, use the following to specify the correct data type for the columns which are numeric.#Change numeric data to numericdata[['Pop2014','Pop2010','HS_Educ','Veterans','WorkTravelTime','MedianIncome','PopDensity']]=data[['Pop2014','Pop2010','HS_Educ','Veterans','WorkTravelTime','MedianIncome','PopDensity']].astype(float)The following code was used to obtain the following tables.Table 1 = Total Est Population for 2014 for each state sorted from largest to smallest.Table 2 = % Change in Population from 2010 to 2014 for each state sorted from largest to smallest. Turn-in: A screen capture of the top 5 and bottom 5 states from Table 1. (4 pts)A screen capture of the top 5 and bottom 5 states from Table 2. (4 pts)For this problem, we will be scraping a second version of the US QuickFacts data. Link: Consider the raw html code for this web page. Why is the following line of code going to fail? Discuss briefly. Note: You may find it useful to use Google Chrome’s Inspect Element feature. mydata = soup.find_all("tr", {"class": "data"})Turn-in: A brief discussion as to why this line of code will fail. (3 pts)Make the following change to your code. Changemydata = soup.find_all("tr", {"class": "data"})tomydata = soup.find_all("tr")This new line of code may appear to be correct; however, mydata contains a few garbage <tr> elements. Identify these garbage elements and where did they come from? Turn-in: A brief discussion of the issues remaining with mydata. (3 pts)Consider the following code to drop 1st and last element of an array.#Example of dumping 1st and last element of an arraya = np.array([0,1,2,4,5,6,7,8,9, 10])atrim = a[1:len(a)-1]atrimModify the mydata array above so that it contains only the appropriate <tr> tags. Use this modified mydata array to create a DataFrame like was done in Problem #6. Create the following two tables using this DataFrame.Table 3 = Compute the average % of veterans for each state sorted from largest to smallest. You should use Pop2010 for this as Pop2014 is missing for a county in VA. table1 = data.pivot_table('PCT_Veteran',index='State',aggfunc='mean')Table 4 = Compute the size of each state (in square miles) sorted from largest to smallest. Population Density= Population Sizetable2 = data.pivot_table('Size',index='State',aggfunc=’sum’)Note: The size for Alaska is infinity as one county has 0 population density. You can ignore this issue for this exam. The above summary produces an inf, i.e. infinity, for Alaska.Turn-in: A screen capture of the top 5 and bottom 5 states from Table 3. (4 pts)A screen capture of the top 5 and bottom 5 states from Table 4. (4 pts) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download