Jamesfreelanddotorg.files.wordpress.com



############################################################################## Anything that is on a line that stats with # is viewed as a comment# by Python and is ignored when running the code. This code was written in# version 3.4.2 of Python## All text in this document should be copied into a *.py Python file for use############################################################################## This program was written by Michael Babington for presentation at# Florida State University Library Scholars Commons Workshop entitled # Exploring Digital Scholarships: Data, Texts, and Tools ## Any questions can be sent to: mb13m@my.fsu.edu############################################################################## These are the following packages that need to be importedimport requestsfrom bs4 import BeautifulSoupimport csv# First we download the html code from the webpage and set it into a variable called rr = requests.get(";)# Next we use Beautiful Soup to make the html code "pretty" and easier to read by Pythonsoup = BeautifulSoup(r.content)# Uncomment the following line to look at the "pretty" html code#print(soup.prettify)# Declare a holding array d that we will put the data we want to pull out of the html code ind =[]# The data that we want to pull from the webpage is in the 6th "table" tag in the HTML Code# Since Python starts counting with 0 we go to the 5th table. data = soup("table")[5]# This for loop goes over each item in data and places each row from our table into its own seperate row in the array dfor item in data: try: d.extend([item.contents]) # Uncomment the print line to tell python to print each row of the table, Note that all the HTML code will # still be in there so it will look "messy" #print(item.contents) except: pass# the try: command tells python to try to do what is on the following lines, if for some reason that line fails to run# it contiunes to the except: command which tells python to skip over the error and contiue running the file. # Now each row of d contains each row in the table, from the html code each row corresponds to each <tr> tag# For example the first <tr> only contains "Team Statistics" if we wanted to view that row of the data we type#d[0]# the second third and fourth rows contain either no information or the headers of the columns. The fifth row# is our first row that containts the actual information we want to pull out. To see it we can type#d[4]# it still containts all the HTML code but we can see it has the informaiton we want. We can also see that each item# in this row is seperated by a comma. If we wanted to get the data from row 5, second enrty we would tyle#d[4][1]# which tells python to go to the fifth row and the second element. For the opponent played this is contained in the fourth# entry in the row#d[4][3]############################################################################################################################## Continuting this exercise for the sixth row, which contains the next week of game data we have the exact same pattern. The# date is contained in#d[5][1]# And the opponent is in#d[5][3]############################################################################################################################## Continuing this process we can tell that every other entry has the data we want to extract. Now we need to write a loop#t o go over each row and extract the data in an easy to read manner.# Before we do that we create holding arrays for each column of data we want. These correspond to the columns of data from# the table we are extracting fromc1 = ['Date']c3 = ['Opponent']c5 = ['Rusing Attempts']c7 = ['Rushing Yards']c9 = ['Rushing Touchdowns']c11 = ['Longest Rush']c13 = ['Number Receptions']c15 = ['Receiving Yards']c17 = ['Receiving Touchdowns']c19 = ['Longest Receptions']c21 = ['Completions-Attempits-Interceptions']c23 = ['Passing Yards']c25 = ['Passing Touchdowns']c27 = ['Longest Pass']c29 = ['Number of Kick Returns']c31 = ['Kick Return Yards']c33 = ['Kick Return Touchdowns']c35 = ['Longest Kick Return']c37 = ['Number of Punt Returns']c39 = ['Punt Return Yards']c41 = ['Longest']c43 = ['Total Offensive Yards']# The loops tells python to start at row 5 and continue for the entire length of d and take the each item and place it# in the correct holding array that we assigned above# The .text portion only takes the text out of the HTML code and ignotes all other data.# For example uncomment the next line to see the what Python is pulling from the HTML code. #d[4][1].text# Note that this includes a unicode character \xa0, which will not appear when exporting to .csv filefor i in range(4,len(d)): try: c1.extend([d[i][1].text]) #This contains the date except: pass try: c3.extend([d[i][3].text]) # This contains the opponent except: pass try: c5.extend([d[i][5].text]) # This conntains Rushing attempts except: pass try: c7.extend([d[i][7].text]) # This contains rushing yards except: pass try: c9.extend([d[i][9].text]) # Rushing touchdowns except: pass try: c11.extend([d[i][11].text]) # Longest Rush except: pass try: c13.extend([d[i][13].text]) # Number of Receptions except: pass try: c15.extend([d[i][15].text]) # Receiving Yards except: pass try: c17.extend([d[i][17].text]) # Recceiving Touchdowns except: pass try: c19.extend([d[i][19].text]) # Longest Receptions except: pass try: c21.extend([d[i][21].text]) # Completions-Attempts-Interceptions except: pass try: c23.extend([d[i][23].text]) # Passing Touchdowns except: pass try: c25.extend([d[i][25].text]) # Longets Pass except: pass try: c27.extend([d[i][27].text]) # Number of Kick returns except: pass try: c29.extend([d[i][29].text]) # Kick Return Yards except: pass try: c31.extend([d[i][31].text]) # Kick Return Touchdowns except: pass try: c33.extend([d[i][33].text]) # Longest Kick Return except: pass try: c35.extend([d[i][35].text]) # Number of Punt Returns except: pass try: c37.extend([d[i][37].text]) # Punt Return Yards except: pass try: c39.extend([d[i][39].text]) # Punt Return Touchdowns except: pass try: c41.extend([d[i][41].text]) #Longets Punt Return except: pass try: c43.extend([d[i][43].text]) # Total Offensive Yards except: pass# Next we combine all the data into one large data matrix to exportoutput = [c1,c3,c5,c7,c9,c11,c13,c15,c17,c19,c21,c23,c25,c27,c29,c31,c33,c35,c37 ,c39,c41,c43]# This transposes the dataset so the variable names are on the top and the observations are the rowsfinaldata = map(list,zip(*output))# This is a function file that writes our data to a .csv file so we can open the data in excel (or other program)def csv_writer(data,path): """ Write Data to a CSV file path """ with open(path,"w", newline = '') as csv_file: writer = csv.writer(csv_file, delimiter = ',') for line in data: writer.writerow(line)# These lines call the function file and write the data to FSUFootballData.csv# This will be saved in the same folder you have the Python File inif __name__ == "__main__": path = "FSUFootballData.csv" csv_writer(finaldata,path) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download