Python dictionaries - Verbs Index

[Pages:13]DRAFT ? June 28, 2002 Ron Zacharski

5 Python dictionaries

Your programming skills are about to take a quantum leap forward. This tutorial covers Python dictionaries, a very versatile way of structuring your data. You'll find them useful for a variety of natural language applications including parsers and semantic analyzers.

5.1 A 'not so wonderful' analogy

I've been trying to come up with a useful analogy to explain what's so special about Python dictionaries. The best I could come up with--and it's not that wonderful of an analogy--is the following. Suppose you are not the best organized person in the world. When someone gives you their phone number you write that person's name and number on an index card and throw it in your 'address' box. When you want to look up a person's number you need to examine each card in the box until you find the relevant information. Now let's suppose a different scenario. Suppose you have your own personal assistant who has a photographic memory. You can just give your assistant the name of a person and he instantly responds with the telephone number. That is, instead of spending time searching through your box, you give your assistant what we call a key (a person's name) and he responds with the value for this key (the phone number). A Python dictionary consists of a set of key value pairs. Let's look at an example:

>>> a = {'Ann': '592-6372', 'Ben': '282-8992', 'Flora': '927-9021', 'Isaac': '423-3790'} Here we've defined a list of key-value pairs. For example, 'Ann': '592-6372'. The members of the pair are separated by a colon. Each pair is separated from the other pairs by a comma. To look-up a value we do the following:

DRAFT ? June 28, 2002 Ron Zacharski

>>> a['Ben'] '282-8992' >>> a['Flora'] '927-9021'

and to add an entry to the dictionary we type:

>>> a['Hamid'] = '991-1911'

We can view the entire dictionary by typing the name of the dictionary:

>>> a {'Isaac': '423-3790', 'Hamid': '991-1911', 'Ann': '592-6372', 'Flora': '9279021', 'Ben': '282-8992'} >>>

One way of viewing these dictionaries is as a table:

Key Isaac Hamid Ann Flora Ben

Value 423-3790 991-1911 592-6372 927-9021 282-8992

Here's another example:

Key Baker Spencer Radford

Value English Syntax Morphological Theory Syntactic Theory

Here's the associated Python code for the above table:

>>> books = {'Baker': 'English 'Radford': 'Syntactic Theory'} >>> books['Baker'] 'English Syntax' >>> books['Radford'] 'Syntactic Theory'

Syntax',

'Spencer':

'Morphological

Theory',

I can add an entry into the dictionary as follows: >>> books['Jackendorf'] = 'Semantic Theory'

And then check the result:

DRAFT ? June 28, 2002 Ron Zacharski

>>> books['Jackendorf'] 'Semantic Theory' >>>

Let's look at another example. Suppose that the following words are unambiguous:

Word dog cat saw a the happy lazy

Part of Speech noun noun verb det (determiner) det adj (adjective) adj

Let's represent the relationship between a word and its part of speech by using a Python dictionary:

>>> pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} >>>

What's the part of speech of happy?

>>> pos['happy'] 'adj' >>>

Let's say I want to write a function, tag, that does the following:

>>> tag('the happy cat') ' det adj noun' >>> tag('the happy cat saw the lazy dog') ' det adj noun verb det adj noun' >>>

That is, it takes as input a string of words and returns a string of part-of-speech tags.

If you think you can write this on your own, give it a try before turning the page. Otherwise, continue.

Let's rough out our function:

def tag(sentence): """look up each word in the sentence and return its part of speech""" # create dictionary # initialize return string # divide string into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

So, the input to this function is called 'sentence'.1 My plan is to have the function return a string called pos_tags (for 'part of speech tags') that contains the part-of-speech tag for each word in the sentence--pos_tags will start out empty ('') and we will add the part-of-speech tags to the end.

Let's convert each of the English-like comments in the function to Python code.

def tag(sentence): """look up each word in the sentence and return its part of speech""" # create dictionary #initialize return string # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

The first thing we want to do is create a Python dictionary, let's call it pos (for 'part of speech'):

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} #initialize return string # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

1

Remember that sentence is just a name. The name has no significance--it's just a name I invented.

You can replace every occurrence of sentence with a name you invent and the example will work equally

well.

86

The next step is to initialize the return string:

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

We want the return string to initially be empty, '':

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string pos_tags = '' # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

Next, we want to divide the string into words:

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string pos_tags = '' # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

87

Recall from chapter 4, that we can split a string into words by using the split method:

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string pos_tags = '' # divide sentence into words words = sentence.split() # for each word # lookup part of speech and add it to pos_tags # return pos_tags

Now, we are at the for loop:

# for each word # lookup part of speech and add it to rstring

which we can translate as follows:

# for each word for word in words:

# lookup part of speech and add it to rstring pos_tags = pos_tags + ' ' + pos[word]

The line pos_tags = pos_tags + ' ' + pos[word] adds a space and the part-of-speech of the current word to the end of pos_tags. Finally, we need to return pos_tags and our function is finished:

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} #initialize return string pos_tags = '' # divide sentence into words words = sentence.split() # for each word for word in words: # lookup part of speech and add it to pos_tags pos_tags = pos_tags + ' ' + pos[word] # return pos_tags return pos_tags

Exercise 1: A modification to the tagger:

88

A current format for tagging text uses opening and closing tags as follows:

raw text:

the happy cat

tagged text: the happy cat

Can you modify the above tagger program to have similar output? That is,

>>> tagger('the happy dog') ' the happy dog' >>>

(see solutions at the end of this tutorial)

5.2 Entries not in dictionary

Let's say we wrote the function described in exercise 1 and it's looking pretty good:

>>> tagger('the lazy cat') ' the lazy cat' >>>

What happens if we try a word not in our part of speech list:

>>> tagger('the happy poodle') Traceback (most recent call last):

File "", line 1, in ? File "C:\Python23\tagger.py", line 17, in tagger

opening_tag = '' KeyError: 'poodle' >>>

Hmmm. This does not look so wonderful.

Let's look at another example of the problem:

>>> pos = {'dog': 'noun', 'cat': 'noun'} >>> pos['dog'] 'noun' >>> pos['poodle'] Traceback (most recent call last):

File "", line 1, in ? KeyError: 'poodle' >>>

Remember, we described dictionaries as containing key-value pairs. This error occurs when the key is not present in the dictionary. So the above error occurred because poodle is not a

89

key in the pos dictionary. This makes our tagger function not so useful. It will crash and die (well, just an error) if it encounters an unknown word. However, there is an easy way to fix this problem. Just like with lists, we can check if a dictionary has a specific key by using "in":

>>> pos = {'dog': 'noun', 'cat': 'noun'} >>> 'dog' in pos True >>> 'poodle' in pos False >>> 'cat' in pos True

As you can see, using "in" returns True or False depending on whether or not the dictionary contains a specific key. What we might want to do for our tagger program is for a given word:

if the pos dictionary has the word as a key: then look up the pos of the word and create a tag

else: use the generic tag `word'

So instead of

>>> tagger('the happy poodle') Traceback (most recent call last):

File "", line 1, in ? File "C:\Python23\tagger.py", line 17, in tagger

opening_tag = '' KeyError: 'poodle' >>>

we would get

>>> tagger('the happy poodle') ' the happy poodle'

Adding this English description of what we want to do to our tagger function:

def tagger(sentence):

"""look up each word in the sentence and tag it"""

# create dictionary

pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb',

'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'}

tagged_sentence = ''

words = sentence.split()

for word in words:

#

# if the pos dictionary has the word as a key:

#

then set the_pos to the pos of the word

90

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download