Google Earth - Brown University

[Pages:39]Google Earth

April 14 2016

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

1

Web Scraping Introduction

? Why isn't there a nice "importXML" in Python?

? xml.etree.ElementTree module

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

2

Using xml.etree.ElementTree

import xml.etree.ElementTree as et

filename = 'example.xml'

file = open(filename, 'r') contents = file.read() file.close()

tree = et.fromstring(contents) tree.tag

for node in tree.findall('car'): for subnode in node.findall('year'): print(subnode.tag, ": ", subnode.text) for subnode in node.findall('color'): print(subnode.tag, ": ", subnode.text) print('---')

contents:

2010 black 2012 red 2014 yellow

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

3

Using xml.etree.ElementTree

import xml.etree.ElementTree as et

filename = 'example.xml'

file = open(filename, 'r') contents = file.read() file.close()

Internal rep. of the XML

tree = et.fromstring(contents) tree.tag

for node in tree.findall('car'): for subnode in node.findall('year'): print(subnode.tag, ": ", subnode.text) for subnode in node.findall('color'): print(subnode.tag, ": ", subnode.text) print('---')

contents:

2010 black 2012 red 2014 yellow

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

4

Using xml.etree.ElementTree

import xml.etree.ElementTree as et

filename = 'example.xml'

file = open(filename, 'r') contents = file.read() file.close()

Gives back list of "car" nodes

tree = et.fromstring(contents) tree.tag

for node in tree.findall('car'): for subnode in node.findall('year'): print(subnode.tag, ": ", subnode.text) for subnode in node.findall('color'): print(subnode.tag, ": ", subnode.text) print('---')

contents:

2010 black 2012 red 2014 yellow

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

5

Using xml.etree.ElementTree

import xml.etree.ElementTree as et

filename = 'example.xml'

file = open(filename, 'r') contents = file.read() file.close()

Searches for "year" inside " car

tree = et.fromstring(contents) tree.tag

for node in tree.findall('car'): for subnode in node.findall('year'): print(subnode.tag, ": ", subnode.text) for subnode in node.findall('color'): print(subnode.tag, ": ", subnode.text) print('---')

contents:

2010 black 2012 red 2014 yellow

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

6

Using xml.etree.ElementTree

import xml.etree.ElementTree as et

filename = 'example.xml'

file = open(filename, 'r') contents = file.read() file.close()

Gives back list of " year" nodes

tree = et.fromstring(contents) tree.tag

for node in tree.findall('car'): for subnode in node.findall('year'): print(subnode.tag, ": ", subnode.text) for subnode in node.findall('color'): print(subnode.tag, ": ", subnode.text) print('---')

contents:

2010 black 2012 red 2014 yellow

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

7

Using xml.etree.ElementTree

import xml.etree.ElementTree as et

filename = 'example.xml'

file = open(filename, 'r') contents = file.read() file.close()

tree = et.fromstring(contents) tree.tag

for node in tree.findall('car'): for subnode in node.findall('year'): print(subnode.tag, ": ", subnode.text) for subnode in node.findall('color'): print(subnode.tag, ": ", subnode.text) print('---')

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences

contents:

2010 black 2012 red 2014 yellow

subnode.text gives back text subnode.tag gives back tag

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download