Finishing’Regular’Expressions’ XML’/’Web’Scraping’

Finishing Regular Expressions &

XML / Web Scraping

April 7 2015

CSCI 0931 -- Intro. to Comp. for the HumaniKes and Social Sciences

1

Today

? Finish Regular Expressions ? XML Parsing in Python

? Course Calendar

CSCI 0931 -- Intro. to Comp. for the HumaniKes and Social Sciences

2

Last Class

? Introduced iterators and match groups

? Iterators let you loop through a set of match results without creaKng a big list upfront

? Match groups (the parenthesis notaKon) let you find the text that matches smaller parts of a regex paTern

CSCI 0931 -- Intro. to Comp. for the HumaniKes and Social Sciences

3

Last Class

? re.search(`a(\w+)a', `mechanical') ? re.match(`m(..)', `mechanical')

? group(0)

Matched string (in red)

? start(0) or end(0)

Start or Finishing posiKon Match ? group(1)

Matched 1st parenthesis group (in bold)

? re.finditer(`\w+', `This is a text.')

Match

Match

Match

Match

1 (this)

2 (is)

3 (a)

4 (text)

CSCI 0931 -- Intro. to Comp. for the HumaniKes and Social Sciences

4

Data Structures

Lists content `a' `b' `c' `d' `e' `f' `g' `h' `i' `j'

indices 0 1 2 3 4 5 6 7 8 9

Dic5onaries

keys & values

Iterators

Match Objects

Match 1

Key `alice' `carol'

`bob'

Match 2

Match 3

Value `401-111-1111' `401-222-2222' `401-333-3333'

? Matched String Match ? Matched String Start

4 ? Matched String End

CSCI 0931 -- Intro. to Comp. for the HumaniKes and Social Sciences

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download