Lecture 1 - DePaul University - Python read csv file into dictionary

CSC 401 – Lecture #7Yosef MendelsohnDictionaries reviewSuppose you want to store employee records in a collection and access the employee records using the employees’ social security numbers.The list collection is not really useful for this problem because list items are accessed using an index, i.e. a position in a collection.For this problem we need a collection type that allows access to stored items using user-defined “indices”, which are referred to as keys.>>> employees = {'022-83-8795': 'Allison Andover', '987-65-4321': 'Benjamin Benton', '123-45-6789': 'Cassidy Cantor', '874-74-1532': 'Daniel Danton'}>>> employees['123-45-6789']'Cassidy Cantor'A dictionary, also called a map, is a collection type that stores key-value pairs.In the employees dictionary above, ‘022-83-8795’: ‘Allison Andover’ is a key-value pair.The key ‘022-83-8795’ is the “index” of the stored object and is used to access the object. The key must be an object of an immutable type (like a string or number or tuple).The following are the dictionary collection type properties:Items in the collection are accessed by key, not index (e.g. offset)Items in the collection are not ordered, and no ordering assumption can be madeDictionaries are mutable, like lists, meaning they can be modified to contain different items.Dictionaries have dynamic size, meaning they can grow or shrinkDictionary problemsProblem: Write a function wordFreq() that takes as a parameter a file name (i.e. a string object). (For purposes of time, you don’t have to worry about exception handling). The function should compute the frequency of each word in the file and then output the list of words and their frequencies as shown below. The function should also return the dictionary created. Use the file example.txt to test your code.Incidentally, this is a very good example of a situation in which you should think through the problem ahead of time using a pen and paper with pseudocode and/or diagrams. Hints: Remove all punctuation and capitalization from the words before you process them to ensure accurate results. Perhaps the easiest way to do this would be to replace any of the following characters: . , ! ? (period, comma, exclamation point, question mark) with a space.Can you think of any other pre-processing you might want to do? Hint: 'the' and 'The' would be considered as two different words…!To remove punctuation, recall the use of the ‘in’ operator: e.g. for character in '!.,?':string = string.replace(character, " ")The function would be used as follows:>>> wordFreq('sample.txt')a 2no 1this 3text 4is 2contains 1but 1sample 3file 3in 1or 1Variation: Modify the function into one called duplicates()that accepts a file name as the argument. Ths function should return True if there are any duplicate words in the file and False if there are no duplicates in the file. Test your code with the file 'duplicates.txt' and then again with the file 'no_duplicates.txt' HINT: To save a LOT of extra (completely redundant) coding, inside this function, use the wordFreq function we created in the previous step!The random moduleLet's begin by reviewing the module random. Practice problem: Implement a function guess() that takes an integer n as a parameter and implements a simple, interactive guessing game. The function should start by choosing a random integer number in the range from 1 up to and including n. The function should then repeatedly ask the user to enter to guess the chosen number. When the user guesses correctly, the function should print a ‘You got it’ message and terminate. Each time the user guesses incorrectly, the function should indicate whether the guess was too high or too low.If the user types something other than an integer, the function should recover gracefully.>>> guess(100)Enter your guess: fiftyThat was not a valid number.Enter your guess: 50Too highEnter your guess: 25Too lowEnter your guess: 38Too highEnter your guess: 31Too highEnter your guess: 28.5That was not a valid number.Enter your guess: 28Too highEnter your guess: 26You got it!See the solution in this week's file.More collection typesThe standard Python library supports two more collection types: tuples and sets.Let’s start with sets.The set collection typeThe set type is a collection type used to store an unordered collection of unique values.Because this collection is unordered, we can not do things like:print( set_name[0] )Perhaps the key principal of sets – and probably the main thing that makes sets different from other types of sequences, is that sets do not contain any duplicates. Therefore, one use of sets is to remove duplicates from other types of collections. For example, if you take an existing list that has duplicates, and create a set out of it, the set will look exactly like the list – but without any of the duplicate values.Creating a set: Herer are two ways of creating a Set: grains = {'rice','wheat','oat'} #Not the perferred waygrains = set()As you can see, the first option looks almost exactly like a dictionary. Not surprisingly, this can lead to confusion. The ONLY time you should create a set using this syntax is when you are creating a brand new set and already know some of the values you want to place inside it. However, if you are creating an empty set, then do not use the first syntax. Instead, you should always create your new set using the set() "constructor". (We'll discuss what we mean by construcors in a later lecture). Once you have a new set, you can easily start adding items to it. Here is an example where we create a set and we already know some of the values we want to put in it. Again, the one time it’s okay to create the set using braces, is when you know the initial values you want to put inside the set:>>> grains = {'rice', 'wheat', 'corn', 'rye', 'oat', 'wheat', 'millet'}In the above example, Python is smart enough to figure out that we are trying to create a set as opposed to a dictionary. As you can probably surmise, Python figures this out by noting that the values we entered are not pairs. If Python saw Key:Value syntax, Python would instead create a dictionary. >>> grains{'wheat', 'corn', 'rye', 'millet', 'oat', 'rice'}>>> type(grains)<class 'set'>Now—compare what we just did with trying to create an empty set using braces:>>> fruit = {}In this case, Python can't tell if we want a set or a dictionary. If the Python interpreter can't tell, it's always going to guess dictionary! To prove this, note what happens when we check the type of our "set":>>> type(fruit)<class 'dict'>Therefore, remember that when creating an empty set, you should always use the set() constructor:>>> fruit = set()In any case, back to sets…To add an item to a set, use the method add():>>> fruit.add("apple")>>> fruit.add("pear")>>> fruit.add("apple") #will be ignored>>> print(fruit){'apple','pear'}#Note that ‘apple’ only appears once since this is a setSet Techniques:Some of the techniques we use when working with sets are: Membership, Union, Intersection, Difference. For purposes of our discussion, consider these two sets:fruit1 = {'apple','pear','banana'}fruit2 ={'apple','pear','blueberry'}Membership: When you are trying to determine if a certain item is present in the set. item in(setName) Returns True if the item is in the set and False otherwise. returns true if item is present in the set>>> 'banana' in fruit1True>>> 'banana' in fruit2Falseunion(setName) This function returns a set . The set is produced by combining the elements of the two sets. However, because the result must be a set, any items that overlap (i.e. would be duplicates) are removed. As with the membership() function, you can invoke the union() function in two ways:set1.union(set2) returns a set containing a combination of all elements in set1 and all elements in set2. However, any duplicates are removed. (set1 | set2) another way of invoking the same function. The | symbol is known as a “pipe”. >>> fruit1 | fruit2{'banana', 'apple', 'pear', 'blueberry'}Intersection: When you are searching for items that are in both sets. set1.intersection(set2) (set1 & set2) another way of invoking the same function>>> fruit1 & fruit2{'pear', 'apple'}Symmetric Difference: This is executed by the ^ operator. The symmetric difference gives you a combination of two sets, but excludes any items that happen to be in both sets. set1 ^ set2>>> fruit1 ^ fruit2{'blueberry', 'banana'}Difference: The – operator gives you everything that is in the first set that you specify, but is not present in the second set that you specify. In other words, unlike the previous examples, when typing the code for ‘difference’, the order matters. set1-set2 will return everything in set1 minus those items that also appear in set2. >>> fruit1-fruit2{'banana'}>>> fruit2-fruit1{'blueberry'}A few more examples:>>> s = {1, 2, 3}>>> t = {2, 3, 4}>>> s | t{1, 2, 3, 4}>>> s&t{2, 3}>>> s^t{1, 4}>>> s-t{1}>>> t-s{4}>>> (s-t) | (t-s){1, 4}Converting a list to a set: Imagine that you have a list, but you want to get rid of any duplicates from that list. For example:lstFruit = ['apple','pear','banana','pear', 'pear','apple']You can easily convert this by using the set “constructor” function like so:setFruit = set(lstFruit)setFruit will contain: {‘apple’,’pear’,’banana’}To learn more, read: or you can always do a quick check (albeit in a less easy-to-read format) by typing at the console:>>> help(set)Help on class set in module builtins:[…]Question: Can you think of a problem we did today that would have been MUCH easier using sets? …Answer: The duplictes() function!I've intentionally made the text of the answer small. Select the text and increase the font size to see the answer.Problem: Recall the function we wrote in the dictionary lecture: ?duplicates() that takes as a parameter a string representing the name of a text file.? The function?returns?True if the file contains duplicate words and False if the file doesn't contain duplicate words.? Make sure to remove all punctuation (i.e. all commas, question marks, periods, colons, and semicolons) before determining whether any words are duplicated so that the punctuation won't interfere with the process.? Your function should ignore case (e.g. APPLE = Apple = apple).The above version used a dictionary to determine if there were duplicates. Rewrite the function as duplicates_set() so that it uses a set instead of a dictionary. (It should be much simpler). See the code in the solutions file for this lecture.The tuple collection typeThe tuple type is essentially the same as a list with one important distinction: A tuple is immutable, i.e. a tuple cannot be modified.You'll also note that we create a tuple using parentheses as opposed to square brackets:>>> t = (3,-7, 92)>>> t(3, -7)>>> type(t)<class 'tuple'>>>> t[0]3>>> t[1] = 3 trying to modify the tuple!Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> t[1] = 3TypeError: 'tuple' object does not support item assignmentCollection type properties for tuples include:Items in the collection are orderedItems in the collection are accessed using an index (offset)Like strings – but unlike lists -- tuples are immutableTuples have a fixed length, since they cannot change. For example, whereas lists can grow and shrink, tuples can not. One important use of tuples: Because tuple objects are immutable, a tuple can be used as a dictionary key.More about class tuple >>> help(tuple)Problem: Implement a function lookup() that provides a phonebook lookup feature. The function takes as a parameter a dictionary representing a phonebook. In the dictionary, tuples containing first and last names of individuals (the keys) are mapped to strings containing phone numbers (the values). Your function should provide a simple user interface through which a user can enter the first and last name of an individual and obtain the phone number assigned to that individual. It should indefinitely prompt the user for first and last names, stopping only when the user does a keyboard interrupt (e.g. control-c).For example, it would be used as follows:>>> phonebook = {('Luca', 'Elam'): '(312) 123-4567',\ ('Djengo', 'Thomas'): '(773) 987-6543',\ ('Devon', 'Reilly'): '(520) 454-6677'}>>> lookup(phonebook)Enter the first name: LucaEnter the last name: Elam(312) 123-4567Enter the first name: DevonEnter the last name: Reilly(520) 454-6677Enter the first name: See the solution in the solutions file for this week.Review of loops and collectionsTo review loops and two-dimensional lists, we will do a practice problem.Implement a function csv_to_list() that transforms a .csv spreadsheet into a Python two-dimensional list. Your function will take as input the name of the .csv file and return the corresponding two-dimensional list.Here is what the cities.csv file looks like when opened up in a text editor:The following is an example of how the function would be used:>>> csv_to_list(‘cities.csv’)[[‘Paris’, ‘Sao Paulo’, ‘Tokyo’, ‘Kinshasa’], [‘London’, ‘Mexico City’, ‘Seoul’, ‘Lagos’], [‘Moscow’, ‘New York’, ‘Mumbai’, ‘Cairo’]]Here is the same function invoked on the file data.csv>>> csv_to_list(‘data.csv’)[[‘3’, ‘5’, ‘6’, ‘4’], [‘8’, ‘5’, ‘2’, ‘3’], [‘2’, ‘8’, ‘9’, ‘1’]]The solution is found in this week's code solutions.A note about the upcoming topicsA few of the concepts from this point forward through the end of the course may become a little bit confusing. Therefore, be sure to really try and understand the definitions and follow the code examples as you move through. However—it is very likely that you will end up confused at times. Don't be discouraged. These topic will likely need a few ‘passes’ and mental review before the concepts discussed start to sink in. You will also find the readings from the textbook to be very helpful in getting a handle on these concepts. The reason is that the notes for a course can never encompass the length and detail of an entire textbook. (Which is why we require you to get one!)Some of the upcoming topics are a situation where I particularly recommend that you read (and reread) the text.Review: FunctionsFunctions are a way to package a group of statements so that those statements can be executed within a program by typing a single statement.The purpose of functions includes:Code reuseEncapsulation or information hidingModularity or procedural decompositionSo date in this course we have been using: "predefined" a.k.a. "built-in" functions that work on their own such as open(), print(), etc, etcPredefined functions that are meant to work with specific data types such as replace(), sort(), append(), etc.Functions that we have defined on our own such as the ones we have been creating throughout the course.We are now ready to start a discussion of how functions work "under the hood". Doing so will allow us to further harness the power and flexibility of functions.We will use the following simple function as we continue our discussion:def add_numbers(num1, num2):total = num1 + num2;return totalTerminologyCalling / Invoking v.s. Defining functionsWhen we execute a function by typing out its identifier, we are said to be "invoking" or "calling" that function. In the example below, we invoke the function len(). >>> lst = [3,4,5,6]>>> len(lst)4When we create a function, we are said to be "defining" a function:>>> def calc_average(x,y):return (x+y)/2.0Formal parameters, parameters, argumentsThis is important terminology. So be sure to review as needed until you get comfortable with it.'Formal parameter' is the term that refers to information required by the function when the function is invoked. For example, the function add_numbers has 2 formal parameters. ‘Argument’ refers to the information we provide when we invoke (execute) a function. Each argument must therefore match up with a corresponding formal parameter. Parameters are also known as "arguments".Example: When we invoke the add_numbers function, we must provide two arguments. If we neglect to provide these 2 arguments, the function will not work, and our program may crash. Say we invoked it with: add_numbers(3, 7)The first argument, 3, will be assigned to the first parameter, num1.The second argument, 7, will be assigned to the second parameter, num2.Note: Don't confuse " parameters" which refer to the information required by the function when we create it, with "arguments" which refers to the information passed when we invoke (execute) the function.CONFUSION TIME: All that being said, you may sometimes hear people refer to parameters as “formal parameters” and arguments as simply “parameters”. Confusing? Yes. Inconsistent? Yes. Unfortunately, it just is that way. The main thing is for you to recognize the distinction between the variables we create when we are defining a function (what we typically call ‘parameters’), v.s. the information we pass to the function when we are invoking it (what we typically call ‘arguments’). "Calling Program" a function must always be called from somewhere. For example, we nearly always invoke a function from within another function. If this is the case (and it almost always is!!), the first function (i.e. the one that invoked the other function) is known as the "calling program".The ProcessThe following steps are carried out by Python when a function is called:The calling program (by 'calling program' we typically simply mean a function) suspends execution. For example, if you are writing a function called do_something() and from inside it you then invoke a function do_something_else(), then the do_something() is the "calling program". At this point, the calling program, ie.. do_something() temporarily stops execution. Note: The Python interpreter keeps track of precisely where inside do_something() we were at when we invoked do_something_else(). As a result, when we are ready to return to our original function, we know exactly where to pick up.As discussed a few minutes ago, any formal parameters that are present will get assigned the input information as supplied by the parameters (aka arguments).E.g. When we invoke add_numbers(3,7) the formal parameter num1 is assigned the parameter (aka "argument") 3 and the formal parameter num2 is assigned the argument (aka "parameter") 7.The body of the function is executedThe executions of the calling program (i.e. the program/function from where add_numbers() was originally invoked) gets resumed. Recall that Python has kept track of exactly where we were in the calling program when we invoked the outside function. Therefore, the calling program can continue from the point immediately after the location where add_numbers() was originally invoked.Example 1: See the code in the file invoke1.py.Example 2: Consider the example in invoke2.py. It explicitly shows with print statements what was happening in the previous example.Note: invoke2.py is an example of storing an activation record on the program stack. We will discuss what is meant by this down the road…Example 3: Look at the example in the file nested_functions.py. It contains several nested function calls.The return valueA return statement anywhere inside the body of a function terminates the execution of the function. Also, if that return statement includes a a return value, then that value is sent back to the calling program.Let's say a particular function returns a value. If that is the case then inside the calling program, a function call can be viewed as if it were an expression that is evaluated and a value is produced.Often I like to think of it as though the function call is replaced by the value that it returns. For example:>>> def f(x):return x**4>>> f(4)256>>> tmp = f(4) >>> tmp256Think of the function call: f(4) as being replaced by its return value (in this case, 256).>>> tmp = 3*f(4) + 1>>> tmp769Think of the function call: f(4) as being replaced by its return value (in this case, 256), which is then multiplied by 3. Note that the value returned is not always going to be a number. For example, a function invoked in the example below returns a boolean indicating whether the argument was odd or even:Example: returning_booleans.py>>> test(4)Yes!!>>> test(5)No...Parameter passingThis is also a topic that will require a few “passes” in order to really get it!When we pass parameters (a.k.a. arguments) to a function, we say that those values are passed by reference. (There is another which is known as passing by value. However, we won't get into that here).This seemingly esoteric point turns out to be very important. Therefore, be sure to spend the necessary amount of time reviewing this concept until you understand what is going on.For purposes of our discussion, think of a reference as an address to the location in the computer's memory where the object is stored. To begin, consider what that means if the object being passed is an immutable type - see immutable_params.py.How can we understand what’s going on?Let's start by invoking funcB. That function starts by creating an instance of the integer object with value 3 and assigning it to variable num.Function funcB then makes a call to function funcA, passing, as an actual parameter, not the value 3, but instead a reference to that value. That is, it passes a reference to the integer object. (Don't panic over this last statement, just go along with it for now…)When function funcB is called from within function funcA, the variable num in funcA will point to the same object as the variable num in funcB, i.e. both will point to the single integer object (that happens to be holding the value 3).However, when inside of funcA we assign num a value of 5, num in funcA will be pointing to an entirely different object*. This is because numbers in Python (integers in particular) are immutable.The net effect is that the original value in the calling program (funcB) cannot be changed. This is because it is immutable.*Recall: Every integer value in Python lives in a specific location in memory. So when we change num from 3 to 5, num now references a different location in memory. Remember the diagrams I have been pushing you to do? This is a great time to draw them out!What happens if we pass a reference to a mutable object? See mutable_params.py for an example.How do we understand what’s going on in this case?The function f starts by creating an instance of a list object containing integer objects 3, 6, 9, and 12 and assigning it to variable lst.The function f then makes a call to function g, passing, as an actual parameter, the reference to the list object lst.When the function g is called from within function f, then variable lst in g will point to the same object as variable lst in f, i.e. both will point to the same list [3, 6, 9, 12].However, when l[0] in g is assigned, l[0] in g, and in f, will point to a new object 5. This is because lists in Python are mutable.The net effect is that the original list in the calling program can be changed since it is mutable.Local variablesThis is not a difficult topic, but it IS a very important topic to understand.In the immutable_params.py example, the variable num is used inside function funcA and inside function funcB. A very important thing to note is that, there are two variables called 'num'. Each of those two variables exists only inside its own function.Key point: Every name (e.g. variable, function, module, class) in Python has a place where it "lives". This place is known as a "namespace".Outside of a variable's namespace, that variable does not exist. Therefore, any reference to that variable will result in an error.Example: Loading the code in the file outside.py will produce an error. Why?Answer: (Enlarge to see…) Inside the function local2, we declare the variable i. However, that variable exists only inside the “namespace” of the function local2. Once we are outside of the local2 namespace, that variable i no longer exists. As a result, when in the last line of the program we say print(i), Python will generate an error because this variable is new and hasn't been given a value.In immutable_params.py there are three namespaces: the function funcA, the function funcB, and the module itself.Also note that the names funcA and funcB both live inside the immutable_params module's namespace.The two variables num live inside their respective namespaces, namely the functions in which they are defined. Key point: Even though they have the same identifier, they are two completely different and unrelated variables! Import things to understand about namespaces:When you use a name (a variable name, for example) in a program, the Python interpreter creates or changes, or looks up the name in a namespace, i.e. the location where that variable lives.Names defined inside a function can only be seen by the code inside the function.Names defined inside a function do not clash with names defined outside of it, even if they are the same.Names defined inside a function exist only during the execution of the function. They no longer exist after the function completes execution. That is, when the function ends, the name (e.g. variable) is gone forever.Therefore, each function creates its own namespace. Different function calls will have different corresponding namespaces.As an example, see the code in local_variables.py.Scoping and global variablesScope is a fancy word for describing where something 'exists'. Basically, the scope of a name is simply its namespace. In other words, if at a given time, someone asks you what the scope of something (e.g. a variable) is, you are being asked where that variable / object / function / etc "exists". For example in the following code:def do_stuff():k = 'hello'the scope of the variable 'k ' is: the function do_stuff()Names that are defined inside a function are said to have local scope (local with respect to the function), i.e. their namespace is the body of the function.Names that are defined outside of a function are said to have global scope, i.e. their namespace is the whole module. For example, suppose you define a variable in the very first line of a module. In this case, that variable exists inside the entire module. Even inside any of the functions in your module, you could still access that global variable. Note: Global variables should be used with caution – even avoided. They have their uses, but you should be very careful with them. More on this at a later point…. For now, simply try to make sure that all of your variables are declared inside a function.The module is the largest namespace in Python: there is no namespace beyond it. For example, there is no namespace that encompasses multiple modules. Built-in identifiers such as if, open, while, etc.) are names that have been defined in a module called 'builtins'. Question: If the above names are defined in a separate module, how is it that we are able to use them in our programs without first importing that module?Answer: Python automatically imports the 'bultins' module every time we run a program.Example 1: See scope1.pyExample 2: See scope2.pyIn the second example, note how the identifier x appearing inside function func is assumed to be an entirely new variable with a scope local to the function. This variable x (the one inside the function) is an entirely different variable from the global variable x. Whenever Python encounters an identifier (of a variable, function, etc.), it will search for the identifier definition in the following order:The enclosing function call namespaceThe global (module) namespaceThe builtins module namespaceQuestion: Suppose we are inside a function, and have a local variable with the same identifier as a global variable. Is there a way to refer to the global variable? Answer: Yes. We use the statement 'global'. See scope3.py for an example.The builtins scope is the namespace of many “built-in” identifiers in Python.That is, 'builtins ' is just the namespace of a pre-existing library module that is automatically imported upon starting Python.You can also import this module explicitly and check the names defined in it using the dir statement:>>> import builtins>>> dir(builtins)['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', … 'tuple', 'type', 'vars', 'zip'] ................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Lecture 1 - DePaul University

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches