Part 1: Iterators

[Pages:37]Generators & Coroutines

Edited Version of Slides by David Beazely



Introduction to Iterators and Generators

Part 1: Iterators

Monday, May 16, 2011

Copyright (C) 2008,

1- 11

Iteration

? As you know, Python has a "for" statement

? You use it to loop over a collection of items

>>> for x in [1,4,5,10]:

...

print x,

...

1 4 5 10

>>>

? And, as you have probably noticed, you can

iterate over many different kinds of objects

(not just lists)

Copyright (C) 2008,

1- 12

Monday, May 16, 2011

Iterating over a Dict

? If you loop over a dictionary you get keys

>>> prices = { 'GOOG' : 490.10,

...

'AAPL' : 145.23,

...

'YHOO' : 21.71 }

...

>>> for key in prices:

...

print key

...

YHOO

GOOG

AAPL

>>>

Copyright (C) 2008,

1- 13

Copyright (C) 2008,

1- 13

Iterating over a String

Iterating over a File ? If you loop over a string, you get characters >>> s = "Yow!"

>>> for c in s:

? I..f....you loproinpt ocver a file you get lines

Y

>>o> for line in open("real.txt"):

..w.

print line,

..!.

>>>

Real Programmers write in FORTRAN

Maybe they do now,

in this decadent era of

Lite beer, hand calculators, and "user-friendly" software

but back in the Good Old Days,

when the and Real Copyright (C) 2008,

term "software" sounded Computers were made out

funny of drums

and

vacuu1m- 1t4ubes,

Real Programmers wrote in machine code.

Not FORTRAN. Not RATFOR. Not, even, assembly language.

Machine Code.

Raw, unadorned, inscrutable hexadecimal numbers.

Directly.

Copyright (C) 2008,

1- 15

Consuming Iterables

? Many functions consume an "iterable" object ? Reductions:

sum(s), min(s), max(s)

? Constructors

list(s), tuple(s), set(s), dict(s)

? in operator

item in s

? Many others in the library

Copyright (C) 2008,

1- 16

Iterating over a File

? If you loop over a file you get lines

>>> for line in open("real.txt"):

...

print line,

...

Real Programmers write in FORTRAN

Maybe they do now, in this decadent era of Lite beer, hand calculators, and "user-friendly" software but back in the Good Old Days, when the term "software" sounded funny and Real Computers were made out of drums and vacuum tubes, Real Programmers wrote in machine code. Not FORTRAN. Not RATFOR. Not, even, assembly language. Machine Code. Raw, unadorned, inscrutable hexadecimal numbers. Directly.

Copyright (C) 2008,

1- 15

Consuming Iterables

? Many functions consume an "iterable" object

? ReIdtuectironas:tion Protocol sum(s), min(s), max(s)

??ThCeornesatrsuocntowrhsy you can iterate over different objleicstts(si)s, tthuapltet(hs)e,reseits(sa),spdieccti(fisc) protocol

? >i>n> oitpeemsra=to[1r, 4, 5] >>> it = iter(items) >>>itietm.nienxts()

?1 >M>>ainty.noextth(e) rs in the library 4 >>> it.next() 5

Copyright (C) 2008, http://w>w>w.>dabeaiz.ctom.next() Traceback (most recent call last): File "", line 1, in StopIteration >>>

1- 16

Copyright (C) 2008,

1- 17

Copyright (C) 2008,

1- 17

Iteration Protocol

? An inside look at the for statement

for x in obj: # statements

? USnudeprnepatoh trhetcionvegrs Iteration

_iter = iter(obj)

# Get iterator object

? while 1: Usert-rdy:efixn=ed_itoebr.jenecxtts()can s#upGeptonretxtiteitreamtion

? Examexpceleptb: rSeCtaokopuItnetriantgiodn:own.#.. No more items

>>> #.f.os.rtaxteimnenctosuntdown(10):

? ...

print x,

A..n.y object that supports iter() and next() is

s1>a0>>id9 t8o7 b6e5"4ite3r2ab1le."

? To do this, you just have to make the object

Copyright (C) 2008,

1-18

implement __iter__() and next()

Copyright (C) 2008,

1-19

Supporting Iteration

? Sample implementation

class countdown(object): def __init__(self,start): self.count = start def __iter__(self): return self def next(self): if self.count >> for x in countdown(10):

...

print x,

...

10 9 8 7 6 5 4 3 2 1

>>>

? To do this, you just have to make the object implement __iter__() and next()

Copyright (C) 2008,

1-19

Supporting Iteration

? Sample implementation

Iteration Example class countdown(object): def __init__(self,start):

self.count = start

def __iter__(self):

return self

def next(self):

if self.count >> c =seclofu.nctoduonwtn(-5=) 1 >>> forreituirnncr:

...

print i,

...

5 4 3 2 1

>>>

Copyright (C) 2008,

1-20

Copyright (C) 2008,

1-21

Copyright (C) 2008,

1-21

Iteration Commentary

? There are many subtle details involving the design of iterators for various objects

? However, we're not going to cover that ? This isn't a tutorial on "iterators" ? We're talking about generators...

Copyright (C) 2008,

1-22

Part 2: Generators

Monday, May 16, 2011

Generators

? A generator is a function that produces a sequence of results instead of a single value

def countdown(n):

while n > 0:

yield n

n -= 1

>>> for i in countdown(5):

...

print i,

...

5 4 3 2 1

>>>

? Instead of returning a value, you generate a series of values (using the yield statement)

Copyright (C) 2008,

1-23

Generator Functions

? The function only executes on next()

>>> x = countdown(10)

>>> x

>>> x.next() Counting down from 10 10

Function starts executing here

>>>

? yield produces a value, but suspends the function

? Function resumes on next call to next()

>>> x.next() 9 >>> x.next() 8 >>>

Copyright (C) 2008,

1-25

Copyright (C) 2008,

1-25

Generator Functions

Generator Functions ? When the generator returns, iteration stops

>>> x.next() 1 >>> x.next() Traceback (most recent call last):

File "", line 1, in ?

? StopIteration A>>g>enerator function is mainly a more

convenient way of writing an iterator

? You don't have to worry about the iterator protocol (.next, .__iter__, etc.)

? It just works Copyright (C) 2008,

1-26

Copyright (C) 2008,

1-27

Generators vs. Iterators

? A generator function is slightly different than an object that supports iteration

? A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.

? This is different than a list (which you can iterate over as many times as you want)

Copyright (C) 2008,

1-28

Generator Functions

? A generator function is mainly a more convenient way of writing an iterator

? You don't have to worry about the iterator protocol (.next, .__iter__, etc.)

? It just works

Copyright (C) 2008,

1-27

Generators vs. Iterators

? A generator function is slightly different than an object that supports iteration Generator Expressions A generator is a one-time operation. You

??caAn giteenreartaeteodvevretrhsieongeonfearaltisetdcdoamtaproenhceen,sion

bu>t>>if ayo=u[1w,2a,n3,t4t]o do it again, you have to cal>>l>>>>thbbe=ge(2n*exrfaotroxr ifun nac)tion again.

? Th>i>s>isfodr ififeirnebn:t ptrhiantn ba, list (which you can ite.2r.a.4te6 8over as many times as you want)

>>>

? This loops over a sequence of items and applies

an operation to each item Copyright (C) 2008,

1-28

? However, results are produced one at a time using a generator

Copyright (C) 2008,

1-29

Copyright (C) 2008,

1-29

Generator Expressions

? Important differences from a list comp.

Generator Expressions ? Does not construct a list.

? Only useful purpose is iteration

? ?GenOenrcael csoynnstuamxed, can't be reused

? Example: (expression for i in s if cond1 for j in t if cond2

>>> a = [1,2,.3.,.4] >>> b = [2*x ifforcoxndifninaa]l)

>>> b

? W[>2>,>h4ca,t=6it,(2m8*]xefaonrsx in a)

>>>

if cond1:

for j in t:

Copyright (C) 2008,

if cond2: ...

if condfinal: yield expression

1-30

Copyright (C) 2008,

1-31

A Note on Syntax

? The parens on a generator expression can dropped if used as a single function argument

? Example:

sum(x*x for x in s)

Generator expression

Copyright (C) 2008,

1-32

Generator Expressions

? General syntax

(expression for i in s if cond1 for j in t if cond2 ... if condfinal)

? What it means

for i in s: if cond1: for j in t: if cond2: ... if condfinal: yield expression

Copyright (C) 2008,

1-31

A Note on Syntax

? The parens on a generator expression can

Interlude dropped if used as a single function argument

? Example: ? We now have two basic building blocks

sum(x*x for x in s)

? Generator functions:

def countdown(n):

Genwehrialteorn e>xp0r:ession

yield n n -= 1

? Generator expressions

squares = (x*x for x in s)

Copyright (C) 2008,

1-32

? In both cases, we get an object that

generates values (which are typically

consumed in a for loop)

Copyright (C) 2008,

1-33

Programming Problem

Find out how many bytes of data were transferred by summing up the last column of data in this Apache web server log

81.107.39.38 - ... "GET /ply/ HTTP/1.1" 200 7587 81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133 81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 200 23903 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359 66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447

Oh yeah, and the log file might be huge (Gbytes)

Copyright (C) 2008,

1-35

The Log File

? Each line of the log looks like this:

81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238

?AThNe nuomnbe-r Gof beytens eis rthae ltasot croluSmon ln

?? bytestr = line.rsplit(None,1)[1] Just do a simple for-loop It's either a number or a missing value (-) wwwlog = open("access-log") to8t1a.l10=7.039.38 - ... "GET /ply/ HTTP/1.1" 304 -

? for line in wwwlog: Cobnyvteesrttrin=gltihnee.rvsaplluite(None,1)[1] if bytestr != '-': if bytetsottral!=+='-i'n:t(bytestr) bytes = int(bytestr) print "Total", total

? We read line-by-line and just update a sum Copyright (C) 2008,

1-36

? However, that's so 90s...

Copyright (C) 2008,

1-37

Copyright (C) 2008,

1-35

The Log File

? Each line of the log looks like this:

81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238

? The number of bytes is the last column

bytestr = line.rsplit(None,1)[1]

? It's either a number or a missing value (-)

81.107.39.38 - ... "GET /ply/ HTTP/1.1" 304 -

? Converting the value

if bytestr != '-': bytes = int(bytestr)

Copyright (C) 2008,

1-36

Part 3: Pipelines

Monday, May 16, 2011

Copyright (C) 2008,

1-37

A Generator Solution

? Let's use some generator expressions

Generators as a Pipeline wwwlog

= open("access-log")

bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)

bytes

= (int(x) for x in bytecolumn if x != '-')

? print "Total", sum(bytes) To understand the solution, think of it as a data

? Wprohoceas!sTinhgapt'ispedliifnfeerent!

access-log

? Less code

?wwwlog

bytecolumn

bytes

sum()

total

A completely different programming style

? Each step is defined by iteration/generation

wwwlog Copyright (C) 2008,

= open("access-log")

1-38

bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)

bytes

= (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes)

Copyright (C) 2008,

1-39

Being Declarative

? At each step of the pipeline, we declare an operation that will be applied to the entire input stream

access-log wwwlog

bytecolumn

bytes

sum()

total

bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)

This operation gets applied to every line of the log file

Copyright (C) 2008,

1-40

Generators as a Pipeline

? To understand the solution, think of it as a data processing pipeline

access-log wwwlog

bytecolumn

bytes

sum() total

? Each step is defined by iteration/generation

wwwlog

= open("access-log")

bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)

bytes

= (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes)

Copyright (C) 2008,

1-39

Being Declarative

? At each step of the pipeline, we declare an

operation that will be applied to the entire

Being input stream Declarative

access-log wwwlog

bytecolumn

bytes

sum() total

?bytecInolsutmena=d (olfinfeo.crsupsliintg(Noonne,t1h)[e1]prfoorblleimne iant awwwlog) line-by-line level, you just break it down

inTtohibsigopoepreartaiotinongsettshaatppolpieedrattoe on the wholeevfielrey line of the log file

? This is very much a "declarative" style

? The key : Think big... Copyright (C) 2008,

1-40

Copyright (C) 2008,

1-41

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download