jamesfreelanddotorg.files.wordpress.com

############################################################################## Anything that is on a line that stats with # is viewed as a comment# by Python and is ignored when running the code. This code was written in# version 3.4.2 of Python## All text in this document should be copied into a *.py Python file for use############################################################################## This program was written by Michael Babington for presentation at# Florida State University Library Scholars Commons Workshop entitled # Exploring Digital Scholarships: Data, Texts, and Tools ## Any questions can be sent to: mb13m@my.fsu.edu############################################################################## These are the following packages that need to be importedimport requestsfrom bs4 import BeautifulSoupimport csv# First we download the html code from the webpage and set it into a variable called rr = requests.get(";)# Next we use Beautiful Soup to make the html code "pretty" and easier to read by Pythonsoup = BeautifulSoup(r.content)# Uncomment the following line to look at the "pretty" html code#print(soup.prettify)# Declare a holding array d that we will put the data we want to pull out of the html code ind =[]# The data that we want to pull from the webpage is in the 6th "table" tag in the HTML Code# Since Python starts counting with 0 we go to the 5th table. data = soup("table")[5]# This for loop goes over each item in data and places each row from our table into its own seperate row in the array dfor item in data: try: d.extend([item.contents]) # Uncomment the print line to tell python to print each row of the table, Note that all the HTML code will # still be in there so it will look "messy" #print(item.contents) except: pass# the try: command tells python to try to do what is on the following lines, if for some reason that line fails to run# it contiunes to the except: command which tells python to skip over the error and contiue running the file. # Now each row of d contains each row in the table, from the html code each row corresponds to each <tr> tag# For example the first <tr> only contains "Team Statistics" if we wanted to view that row of the data we type#d[0]# the second third and fourth rows contain either no information or the headers of the columns. The fifth row# is our first row that containts the actual information we want to pull out. To see it we can type#d[4]# it still containts all the HTML code but we can see it has the informaiton we want. We can also see that each item# in this row is seperated by a comma. If we wanted to get the data from row 5, second enrty we would tyle#d[4][1]# which tells python to go to the fifth row and the second element. For the opponent played this is contained in the fourth# entry in the row#d[4][3]############################################################################################################################## Continuting this exercise for the sixth row, which contains the next week of game data we have the exact same pattern. The# date is contained in#d[5][1]# And the opponent is in#d[5][3]############################################################################################################################## Continuing this process we can tell that every other entry has the data we want to extract. Now we need to write a loop#t o go over each row and extract the data in an easy to read manner.# Before we do that we create holding arrays for each column of data we want. These correspond to the columns of data from# the table we are extracting fromc1 = ['Date']c3 = ['Opponent']c5 = ['Rusing Attempts']c7 = ['Rushing Yards']c9 = ['Rushing Touchdowns']c11 = ['Longest Rush']c13 = ['Number Receptions']c15 = ['Receiving Yards']c17 = ['Receiving Touchdowns']c19 = ['Longest Receptions']c21 = ['Completions-Attempits-Interceptions']c23 = ['Passing Yards']c25 = ['Passing Touchdowns']c27 = ['Longest Pass']c29 = ['Number of Kick Returns']c31 = ['Kick Return Yards']c33 = ['Kick Return Touchdowns']c35 = ['Longest Kick Return']c37 = ['Number of Punt Returns']c39 = ['Punt Return Yards']c41 = ['Longest']c43 = ['Total Offensive Yards']# The loops tells python to start at row 5 and continue for the entire length of d and take the each item and place it# in the correct holding array that we assigned above# The .text portion only takes the text out of the HTML code and ignotes all other data.# For example uncomment the next line to see the what Python is pulling from the HTML code. #d[4][1].text# Note that this includes a unicode character \xa0, which will not appear when exporting to .csv filefor i in range(4,len(d)): try: c1.extend([d[i][1].text]) #This contains the date except: pass try: c3.extend([d[i][3].text]) # This contains the opponent except: pass try: c5.extend([d[i][5].text]) # This conntains Rushing attempts except: pass try: c7.extend([d[i][7].text]) # This contains rushing yards except: pass try: c9.extend([d[i][9].text]) # Rushing touchdowns except: pass try: c11.extend([d[i][11].text]) # Longest Rush except: pass try: c13.extend([d[i][13].text]) # Number of Receptions except: pass try: c15.extend([d[i][15].text]) # Receiving Yards except: pass try: c17.extend([d[i][17].text]) # Recceiving Touchdowns except: pass try: c19.extend([d[i][19].text]) # Longest Receptions except: pass try: c21.extend([d[i][21].text]) # Completions-Attempts-Interceptions except: pass try: c23.extend([d[i][23].text]) # Passing Touchdowns except: pass try: c25.extend([d[i][25].text]) # Longets Pass except: pass try: c27.extend([d[i][27].text]) # Number of Kick returns except: pass try: c29.extend([d[i][29].text]) # Kick Return Yards except: pass try: c31.extend([d[i][31].text]) # Kick Return Touchdowns except: pass try: c33.extend([d[i][33].text]) # Longest Kick Return except: pass try: c35.extend([d[i][35].text]) # Number of Punt Returns except: pass try: c37.extend([d[i][37].text]) # Punt Return Yards except: pass try: c39.extend([d[i][39].text]) # Punt Return Touchdowns except: pass try: c41.extend([d[i][41].text]) #Longets Punt Return except: pass try: c43.extend([d[i][43].text]) # Total Offensive Yards except: pass# Next we combine all the data into one large data matrix to exportoutput = [c1,c3,c5,c7,c9,c11,c13,c15,c17,c19,c21,c23,c25,c27,c29,c31,c33,c35,c37 ,c39,c41,c43]# This transposes the dataset so the variable names are on the top and the observations are the rowsfinaldata = map(list,zip(*output))# This is a function file that writes our data to a .csv file so we can open the data in excel (or other program)def csv_writer(data,path): """ Write Data to a CSV file path """ with open(path,"w", newline = '') as csv_file: writer = csv.writer(csv_file, delimiter = ',') for line in data: writer.writerow(line)# These lines call the function file and write the data to FSUFootballData.csv# This will be saved in the same folder you have the Python File inif __name__ == "__main__": path = "FSUFootballData.csv" csv_writer(finaldata,path) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches