Scraping class Documentation - Read the Docs
Scraping class Documentation
Release 0.1 IRE/NICAR
Jul 31, 2019
Contents
1 What you will make
3
2 Prelude: Prerequisites
5
2.1 Command-line interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Text editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 pip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Act 1: The command line
7
3.1 Print the current directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 List files in a directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Change directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Creating directories and files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Deleting directories and files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Act 2: Python
11
4.1 How to run a Python program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Control structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 Python as a toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Act 3: Web scraping
21
5.1 Installing dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Analyzing the HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Extracting an HTML table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 But that's not all: Getting the missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
i
ii
Scraping class Documentation, Release 0.1
A step-by-step guide to writing a web scraper with Python. The course assumes the reader has little experience with Python and the command line, covering a number of fundamental skills that can be applied to other problems. This guide was initially developed by Chase Davis, Jackie Kazil, Sisi Wei and Matt Wynn for bootcamps held by Investigative Reporters and Editors at the University of Missouri in Columbia, Missouri in 2013 and 2014. It was modified by Ben Welsh in December 2014 for workshops at The Centre for Cultura Contemporania de Barcelona, Medialab-Prado and the Escuela de Periodismo y Comunicaci?n at Universidad Rey Juan Carlos.
? Code repository: ireapps/first-web-scraper/ ? Documentation: first-web-scraper. ? Issues: ireapps/first-web-scraper/issues/
Contents
1
Scraping class Documentation, Release 0.1
2
Contents
1 CHAPTER
What you will make
This tutorial will guide you through the process of writing a Python script that can extract the roster of inmates at the Boone County Jail in Missouri from a local government website and save it as comma-delimited text ready for analysis.
3
Scraping class Documentation, Release 0.1
4
Chapter 1. What you will make
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- web scraping
- web scraping for data science with python
- web scraping with python university of illinois urbana
- course 1 a practical introduction to data gathering
- scraping class documentation read the docs
- practical web scraping for data science
- data analytics with python
- web scrapping university of pennsylvania school of arts
- scraping with python for fun profit pycon
- data science with python