List .com



HYPERLINK "" \o "Pandas for Everyone: Python Data Analysis, First Edition" Pandas for Everyone: Python Data Analysis, First EditionPython for Data Analysis, 2nd editionListsLists are a fundamental data structure in Python. They are used to store heterogeneous data, and are created with a pair of square brackets,?[].Click here to view code imagemy_list?=?['a',?1,?True,?3.14]We can subset the list using square brackets and provide the index of the item we want.# get the first itemprint(my_list[0])aWe can also pass in a range of values (Appendix L).# get the first 3 valuesprint(my_list[:3])['a', 1, True]We can reassign values when we subset values from the list.# reassign the first valuemy_list[0]?=?'zzzzz'print(my_list)['zzzzz', 1, True, 3.14]Lists are objects in Python (Appendix S), so they will have methods that they can perform. For example, we can?append?values to the list.Click here to view code imagemy_list.append('appended a new value!')print(my_list)['zzzzz', 1, True, 3.14, 'appended a new value!']More about lists and their various methods can be found in the documentation.11.? contrast?with tuples, lists are variable-length and their contents can be modified in-place. You can define them using square brackets?[]?or using?the?list?type function:In [37]: a_list = [2, 3, 7, None]In [38]: tup = ('foo', 'bar', 'baz')In [39]: b_list = list(tup)In [40]: b_listOut[40]: ['foo', 'bar', 'baz']In [41]: b_list[1] = 'peekaboo'In [42]: b_listOut[42]: ['foo', 'peekaboo', 'baz']Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions.The?list?function is frequently used in data processing as a way to materialize an iterator or generator expression:In [43]: gen = range(10)In [44]: genOut[44]: range(0, 10)In [45]: list(gen)Out[45]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]Adding and removing elementsElements can be appended to?the end of the list with?the?append?method:In [46]: b_list.append('dwarf')In [47]: b_listOut[47]: ['foo', 'peekaboo', 'baz', 'dwarf']Using?insert?you can?insert an element at a specific location in the list:In [48]: b_list.insert(1, 'red')In [49]: b_listOut[49]: ['foo', 'red', 'peekaboo', 'baz', 'dwarf']The insertion index must be between 0 and the length of the list, inclusive.WARNINGinsert?is computationally expensive compared with?append, because references to subsequent elements have to be shifted internally to make room for the new element. If you need to insert elements at both the beginning and end of a sequence, you may?wish to explore?collections.deque, a double-ended queue, for this purpose.The inverse operation to?insert?is?pop, which?removes and returns an element at a particular index:In [50]: b_list.pop(2)Out[50]: 'peekaboo'In [51]: b_listOut[51]: ['foo', 'red', 'baz', 'dwarf']Elements can be removed by value?with?remove, which locates the first such value and removes it from the list:In [52]: b_list.append('foo')In [53]: b_listOut[53]: ['foo', 'red', 'baz', 'dwarf', 'foo']In [54]: b_list.remove('foo')In [55]: b_listOut[55]: ['red', 'baz', 'dwarf', 'foo']If performance is not a concern, by using?append?and?remove, you can use a Python list as a perfectly suitable “multiset” data structure.Check if a list contains a value using?the?in?keyword:In [56]: 'dwarf' in b_listOut[56]: TrueThe?keyword?not?can be used to negate?in:In [57]: 'dwarf' not in b_listOut[57]: FalseChecking whether a list contains a value is a lot slower than doing so with dicts and sets (to be introduced shortly), as Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.Concatenating and combining listsSimilar to tuples,?adding two lists together?with?+?concatenates them:In [58]: [4, None, 'foo'] + [7, 8, (2, 3)]Out[58]: [4, None, 'foo', 7, 8, (2, 3)]If you have a list already defined, you can append multiple elements to it using?the?extend?method:In [59]: x = [4, None, 'foo']In [60]: x.extend([7, 8, (2, 3)])In [61]: xOut[61]: [4, None, 'foo', 7, 8, (2, 3)]Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using?extend?to append elements to an existing list, especially if you are building up a large list, is usually preferable. Thus,everything = []for chunk in list_of_lists: everything.extend(chunk)is faster than the concatenative alternative:everything = []for chunk in list_of_lists: everything = everything + chunkSortingYou can sort a list?in-place (without creating a new object) by calling?its?sort?function:In [62]: a = [7, 2, 5, 1, 3]In [63]: a.sort()In [64]: aOut[64]: [1, 2, 3, 5, 7]sort?has a few options that will occasionally come in handy. One is the ability to pass a secondary?sort key—that is, a function that produces a value to use to sort the objects. For example, we could sort a collection of strings by their lengths:In [65]: b = ['saw', 'small', 'He', 'foxes', 'six']In [66]: b.sort(key=len)In [67]: bOut[67]: ['He', 'saw', 'six', 'small', 'foxes']Soon, we’ll look at?the?sorted?function, which can produce a sorted copy of a general sequence.Binary search and maintaining a sorted listThe built-in?bisect?module?implements binary search and insertion into a sorted list.?bisect.bisect?finds?the location where an element should be inserted to keep it sorted, while?bisect.insort?actually inserts the element into that location:In [68]: import bisectIn [69]: c = [1, 2, 2, 2, 3, 4, 7]In [70]: bisect.bisect(c, 2)Out[70]: 4In [71]: bisect.bisect(c, 5)Out[71]: 6In [72]: bisect.insort(c, 6)In [73]: cOut[73]: [1, 2, 2, 2, 3, 4, 6, 7]CAUTIONThe?bisect?module functions do not check whether the list is sorted, as doing so would be computationally expensive. Thus, using them with an unsorted list will succeed without error but may lead to incorrect results.SlicingYou can select?sections of most sequence types by using?slice notation, which in its basic form consists of?start:stop?passed to the indexing?operator?[]:In [74]: seq = [7, 2, 3, 7, 5, 6, 0, 1]In [75]: seq[1:5]Out[75]: [2, 3, 7, 5]Slices can also be assigned to with a sequence:In [76]: seq[3:4] = [6, 3]In [77]: seqOut[77]: [7, 2, 3, 6, 3, 5, 6, 0, 1]While the element at?the?start?index is included, the?stop?index is?not included, so that the number of elements in the result is?stop - start.Either the?start?or?stop?can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively:In [78]: seq[:5]Out[78]: [7, 2, 3, 6, 3]In [79]: seq[3:]Out[79]: [6, 3, 5, 6, 0, 1]Negative indices slice the sequence relative to the end:In [80]: seq[-4:]Out[80]: [5, 6, 0, 1]In [81]: seq[-6:-2]Out[81]: [6, 3, 5, 6]Slicing semantics takes a bit of getting used to, especially if you’re coming from R or MATLAB. See?Figure?3-1?for a helpful illustration of slicing with positive and negative integers. In the figure, the indices are shown at the “bin edges” to help show where the slice selections start and stop using positive or negative indices.A?step?can?also be used after a second colon to, say, take every other element:In [82]: seq[::2]Out[82]: [7, 3, 3, 6, 1]A clever use of this is to pass?-1, which has the useful effect of reversing a list or?tuple:In [83]: seq[::-1]Out[83]: [1, 0, 6, 5, 3, 6, 3, 2, 7]Figure 3-1.?Illustration of Python slicing conventionsTupleA tuple is a?fixed-length, immutable sequence of Python objects. The easiest way to create one is with a comma-separated sequence of values:In [2]: tup = 4, 5, 6In [3]: tupOut[3]: (4, 5, 6)When you’re defining tuples in more complicated expressions, it’s often necessary to enclose the values in parentheses, as in this?example of creating a tuple of tuples:In [4]: nested_tup = (4, 5, 6), (7, 8)In [5]: nested_tupOut[5]: ((4, 5, 6), (7, 8))You can convert any sequence or iterator to a tuple by invoking?tuple:In [6]: tuple([4, 0, 2])Out[6]: (4, 0, 2)In [7]: tup = tuple('string')In [8]: tupOut[8]: ('s', 't', 'r', 'i', 'n', 'g')Elements can be accessed with?square brackets?[]?as with most other sequence types. As in C, C++, Java, and many other languages, sequences are 0-indexed in Python:In [9]: tup[0]Out[9]: 's'While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify which object is stored in each slot:In [10]: tup = tuple(['foo', [1, 2], True])In [11]: tup[2] = False---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-11-c7308343b841> in <module>()----> 1 tup[2] = FalseTypeError: 'tuple' object does not support item assignmentIf an object inside a tuple is mutable, such as a list, you can modify it in-place:In [12]: tup[1].append(3)In [13]: tupOut[13]: ('foo', [1, 2, 3], True)You can concatenate tuples?using the?+?operator to produce longer tuples:In [14]: (4, None, 'foo') + (6, 0) + ('bar',)Out[14]: (4, None, 'foo', 6, 0, 'bar')Multiplying a tuple by an integer, as with lists, has the effect of concatenating together that many copies of the tuple:In [15]: ('foo', 'bar') * 4Out[15]: ('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')Note that the objects themselves are not copied, only the references to them.Unpacking tuplesIf you try to?assign?to a?tuple-like expression of variables, Python will attempt to?unpack?the value on the righthand side of the equals sign:In [16]: tup = (4, 5, 6)In [17]: a, b, c = tupIn [18]: bOut[18]: 5Even sequences with?nested tuples can be unpacked:In [19]: tup = 4, 5, (6, 7)In [20]: a, b, (c, d) = tupIn [21]: dOut[21]: 7Using this functionality you can easily swap variable names, a task which in many languages might look like:tmp = aa = bb = tmpBut, in Python, the swap can be done like this:In [22]: a, b = 1, 2In [23]: aOut[23]: 1In [24]: bOut[24]: 2In [25]: b, a = a, bIn [26]: aOut[26]: 2In [27]: bOut[27]: 1A common use of variable unpacking is iterating over sequences of tuples or lists:In [28]: seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]In [29]: for a, b, c in seq: ....: print('a={0}, b={1}, c={2}'.format(a, b, c))a=1, b=2, c=3a=4, b=5, c=6a=7, b=8, c=9Another common use is returning multiple values from a function. I’ll cover this in more detail later.The Python language recently acquired some more advanced tuple unpacking to help with situations where you may want to “pluck” a few elements from the beginning of a tuple. This uses the special syntax?*rest, which is also?used in function signatures to capture an arbitrarily long list of positional arguments:In [30]: values = 1, 2, 3, 4, 5In [31]: a, b, *rest = valuesIn [32]: a, bOut[32]: (1, 2)In [33]: restOut[33]: [3, 4, 5]This?rest?bit is sometimes something you want to discard; there is nothing special about the?rest?name. As a matter of convention, many Python programmers will use the?underscore (_) for unwanted variables:In [34]: a, b, *_ = valuesTuple methodsSince the size and?contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one (also available on lists) is?count, which counts?the number of occurrences of a?value:In [35]: a = (1, 2, 2, 2, 3, 4, 2)In [36]: a.count(2)Out[36]: 4A?tuple?is similar to a?list, in that both can hold heterogeneous bits of information. The main difference is that the contents of a tuple are “immutable,” meaning they cannot be changed. They are also created with a pair of round brackets,?( ).Click here to view code imagemy_tuple?=('a',?1,?True,?3.14)Subsetting items is accomplished in exactly the same ways as for a list (i.e., you use square brackets).# get the first itemprint(my_tuple[0])aHowever, if we try to change the contents of an index, we will get an error.Click here to view code image# this will cause an errormy_tuple[0]?=?'zzzzz'Traceback (most recent call last):??File "<ipython-input-1-3689669e7d2b>", line 2, in <module>????my_tuple[0] = 'zzzzz'TypeError: 'tuple' object does not support item assignmentMore information about tuples can be found in the documentation.11.? likely the?most important built-in Python data structure. A more common name for it is?hash map?or?associative array. It is a flexibly sized collection of?key-value?pairs, where?key?and?value?are?Python objects. One approach for creating one is to use curly?braces?{}?and colons to separate keys and values:In [102]: empty_dict = {}In [103]: d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}In [104]: d1Out[104]: {'a': 'some value', 'b': [1, 2, 3, 4]}You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple:In [105]: d1[7] = 'an integer'In [106]: d1Out[106]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}In [107]: d1['b']Out[107]: [1, 2, 3, 4]You can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value:In [108]: 'b' in d1Out[108]: TrueYou can delete values either?using the?del?keyword or the?pop?method (which simultaneously returns the value and deletes the key):In [109]: d1[5] = 'some value'In [110]: d1Out[110]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}In [111]: d1['dummy'] = 'another value'In [112]: d1Out[112]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value', 'dummy': 'another value'}In [113]: del d1[5]In [114]: d1Out[114]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 'dummy': 'another value'}In [115]: ret = d1.pop('dummy')In [116]: retOut[116]: 'another value'In [117]: d1Out[117]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}The?keys?and?values?method?give you iterators of the dict’s keys and values, respectively. While the key-value pairs are not in any particular order, these functions output the keys and values in the same order:In [118]: list(d1.keys())Out[118]: ['a', 'b', 7]In [119]: list(d1.values())Out[119]: ['some value', [1, 2, 3, 4], 'an integer']You can merge one dict into another using?the?update?method:In [120]: d1.update({'b' : 'foo', 'c' : 12})In [121]: d1Out[121]: {'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}The?update?method changes dicts in-place, so any existing keys in the data passed to?update?will have their old values discarded.Creating dicts from sequencesIt’s common to?occasionally end up with two sequences that you want to pair up element-wise in a dict. As a first cut, you might write code like this:mapping = {}for key, value in zip(key_list, value_list): mapping[key] = valueSince a dict is essentially a collection of 2-tuples,?the?dict?function accepts a list of?2-tuples:In [122]: mapping = dict(zip(range(5), reversed(range(5))))In [123]: mappingOut[123]: {0: 4, 1: 3, 2: 2, 3: 1, 4: 0}Later we’ll talk about?dict comprehensions, another elegant way to construct dicts.Default valuesIt’s very common to?have logic like:if key in some_dict: value = some_dict[key]else: value = default_valueThus, the dict methods?get?and?pop?can?take a default value to be returned, so that the above?if-else?block can be written simply as:value = some_dict.get(key, default_value)get?by default will return?None?if the key is not present, while?pop?will raise an exception. With?setting?values, a common case is for the values in a dict to be other collections, like lists. For example, you could imagine categorizing a list of words by their first letters as a dict of lists:In [124]: words = ['apple', 'bat', 'bar', 'atom', 'book']In [125]: by_letter = {}In [126]: for word in words: .....: letter = word[0] .....: if letter not in by_letter: .....: by_letter[letter] = [word] .....: else: .....: by_letter[letter].append(word) .....:In [127]: by_letterOut[127]: {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}The?setdefault?dict method?is for precisely this purpose. The preceding?for?loop can be rewritten as:for word in words: letter = word[0] by_letter.setdefault(letter, []).append(word)The built-in?collections?module?has a?useful class,?defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:from collections import defaultdictby_letter = defaultdict(list)for word in words: by_letter[word[0]].append(word)Valid dict key typesWhile the values?of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical?term here is?hashability. You can check whether an object is hashable (can be used as a key in a dict) with?the?hash?function:In [128]: hash('string')Out[128]: 5330554102147468818In [129]: hash((1, 2, (2, 3)))Out[129]: 1097636502276347782In [130]: hash((1, 2, [2, 3])) # fails because lists are mutable---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-130-800cd14ba8be> in <module>()----> 1 hash((1, 2, [2, 3])) # fails because lists are mutableTypeError: unhashable type: 'list'To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also?can:In [131]: d = {}In [132]: d[tuple([1, 2, 3])] = 5In [133]: dOut[133]: {(1, 2, 3): 5}Python dictionaries (dict) are efficient ways of storing information. Just as an actual dictionary stores a word and its corresponding definition, so a Python?dict?stores some key and a corresponding value. Using dictionaries can make your code more readable because a label is assigned to each value in the dictionary. Contrast this with?list?objects, which are unlabeled. Dictionaries are created by using a set of curly brackets,?{}.my_dict?=?{}print(my_dict){}print(type(my_dict))<class 'dict'>When we have a?dict, we can add values to it by using square brackets,?[]. We put the key inside these square brackets. Usually, it is some string, but it can actually be any immutable type (e.g., a Python?tuple, which is the immutable form of a Python?list). Here we create two keys,?fname?and?lname, for a first name and last name, respectively.my_dict['fname']?=?'Daniel'my_dict['lname']?=?'Chen'We can also create a dictionary directly, with key–value pairs instead of adding them one at a time. To do this, we use our curly brackets, with the key–value pairs being specified by a colon.Click here to view code imagemy_dict?=?{'fname':?'Daniel',?'lname':?'Chen'}print(my_dict){'fname': 'Daniel', 'lname': 'Chen'}To get the values from our keys, we can use the square brackets with the key inside.fn?=?my_dict['fname']print(fn)DanielWe can also use the?get?method.ln?=?my_dict.get('lname')print(ln)ChenThe main difference between these two ways of getting the values from the dictionary is the behavior that occurs when you try to get a nonexistent key. When using the square brackets notation, trying to get a key that does not exist will return an error.Click here to view code image# will return an errorprint(my_dict['age'])Traceback (most recent call last):??File "<ipython-input-1-404b91316179>", line 2, in <module>????print(my_dict['age'])KeyError: 'age'In contrast, the?get?method will return?None.# will return Noneprint(my_dict.get('age'))NoneTo get all the?keys?from the?dict, we can use the?keys?method.Click here to view code image# get all the keys in the dictionaryprint(my_dict.keys())dict_keys(['fname', 'lname'])To get all the?values?from the?dict, we can use the?values?method.Click here to view code image# get all the values in the dictionaryprint(my_dict.values())dict_values(['Daniel', 'Chen'])To get every key–value pair, you can use the?items?method. This can be useful if you need to loop through a dictionary.Click here to view code imageprint(my_dict.items())dict_items([('fname', 'Daniel'), ('lname', 'Chen')])Each key–value pair is returned in a form of a?tuple, as indicated by the use of round brackets,?().More on dictionaries can be found in the official documentation on data structures.11.? ValuesPython is a zero-indexed language (things start counting from zero), and is also left inclusive, right exclusive you are when specifying a range of values. This applies to objects like?lists?and?Series, where the first element has a position (index) of 0. When creating?ranges or slicing a range of values from a list-like object, we need to specify both the beginning index and the ending index. This is where the left inclusive, right exclusive terminology comes into play. The left index will be included in the returned range or slice, but the right index will not.Think of items in a list-like object as being fenced in. The index represents the fence post. When we specify a range or a slice, we are actually referring to the fence posts, so that everything between the posts is returned.Figure L.1?illustrates why this may be the case. When we slice from 0 to 1, we get only one value back; when we slice from 1 to 3, we get two values back.l?=?['one',?'two',?'three']print(l[0:1])['one']print(l[1:3])['two', 'three']Figure L.1?Use of fence posts to depict slicing syntaxThe slicing notation used,?:, comes in two parts. The value on the left denotes the starting value (left inclusive), and the value on the right denotes the ending value (right exclusive). We can leave one of these values blank, and the slicing will start from the beginning (if we leave the left value blank) or go to the end (if we leave the right value blank).print(l[1:])['two', 'three']print(l[:3])['one', 'two', 'three']We can add a second colon, which refers to the “step.” For example, if we have a step value of?2, then for whatever range we specified using the first colon, the returned value will be every other value from the range.Click here to view code image# get every other value starting from the first valueprint(l[::2])['one', 'three']LoopsLoops provide a means to perform the same action across multiple items. Multiple items are typically stored in a Python?list?object. Any list-like object can be iterated over (e.g., tuples, arrays, dataframes, dictionaries). More information on loops can be found in the Software-Carpentry Python lesson on loops.11.? loop over a list. we use a?for?statement. The basic?for?loop looks like this:for?item?in?container:????# do somethingThe?container?represents some iterable set of values (e.g., a?list). The?item?represents a temporary variable that represents each item in the iterable. In the?for?statement, the first element of the container is assigned to the temporary variable (in this example,?item). Everything in the indented block after the colon is then performed. When it gets to the end of the loop, the code assigns the next element in the iterable to the temporary variable and performs the steps over again.Click here to view code image# an example list of values to iterate overl?=?[1,?2,?3]# write a for loop that prints the value and its squared valuefor?i?in?l:????# print the current value????print('the current value is: {}'.format(i))????# print the square of the value????print("its squared value is: {}".format(i*i))????# end of the loop, the \n at the end creates a new line????print('end of loop, going back to the top\n')the current value is: 1its squared value is: 1end of loop, going back to the topthe current value is: 2its squared value is: 4end of loop, going back to the topthe current value is: 3its squared value is: 9end of loop, going back to the topComprehensionsA typical task in Python is to iterate over a list, run some function on each value, and save the results into a new list.Click here to view code image# create a listl?=?[1,?2,?3,?4,?5]# list of newly calculated resultsr?=?[]# iterate over the listfor?i?in?l:????# square each number and add the new value to a new list????r.append(i?** 2)print(r)[1, 4, 9, 16, 25]Unfortunately, this approach requires a few lines of code to do a relatively simple task. One way to rewrite this loop more compactly is by using a Python list-comprehension. This shortcut offers a concise way of performing the same action.Click here to view code image# note the square brackets around on the right-hand side# this saves the final results as a listrc?=?[i?** 2?for?i?in?l]print(rc)[1, 4, 9, 16, 25]print(type(rc))<class 'list'>Our final results will be a list, so the right-hand side will have a pair of square brackets. From there, we write what looks very similar to a?for?loop. Starting from the center and moving toward the right side, we write?for i in l, which is very similar to the first line of our original?for?loop. On the right side, we write?i ** 2, which is similar to the body of the?for?loop. Since we are using a list comprehension, we no longer need to specify the list to which we want to append our new values.FunctionsFunctions are one of the cornerstones of programming. They provide a way to reuse code. If you’ve ever copy-pasted lines of code just to change a few parameters, then turning those lines of code into a function not only makes your code more readable, but also prevents you from making mistakes later on. Every time code is copy-pasted, it adds another place to look if a correction is needed, and puts that burden on the programmer. When you use a function, you need to make a correction only once, and it will be applied every time the function is called.I highly suggest the Software-Carpentry Python episode on functions for more details.11.? empty function looks like this:def?empty_function():????passThe function begins with the?def?keyword, then the function name (i.e., how the function will be called and used), a set of round brackets, and a colon. The body of the function is indented (1 tab or 4 spaces). This indentation is?extremely?important. If you omit it, you will get an error. In this example,?pass?is used as a placeholder to do nothing.Typically functions will have what’s called a “docstring”—a multiple-line comment that describes the function’s purpose, parameters, and output, and that sometimes contains testing code. When you look up help documentation about a function in Python, the information contained in the function docstring is usually what shows up. This allows the function’s documentation and code to travel together, which makes the documentation easier to maintain.Click here to view code imagedef?empty_function():????"""This is an empty function with a docstring.????These docstrings are used to help document the function.????They can be created by using 3 single quotes or 3 double quotes.????The PEP-8 style guide says to use double quotes.????"""????pass?# this function still does nothingFunctions need not have parameters to be called.Click here to view code imagedef?print_value():????"""Just prints the value 3????"""????print(3)# call our print_value functionprint_value()3Functions can take parameters as well. We can modify our?print_value?function so that it prints whatever value we pass into the function.Click here to view code imagedef?print_value(value):????"""Prints the value passed into the parameter 'value'????"""????print(value)print_value(3)3print_value("Hello!")Hello!Functions can take multiple values as well.Click here to view code imagedef?person(fname, lname, sex):????"""A function that takes 3 values, and prints them????"""????print(fname)????print(lname)????print(sex)person('Daniel',?'Chen',?'Male')DanielChenMaleThe examples thus far have simply created functions that printed values. What makes functions powerful are their ability to take inputs and return an output, not just print values to the screen. To accomplish this, we can use the?return?statement.Click here to view code imagedef?my_mean_2(x, y):????"""A function that returns the mean of 2 values????"""????mean_value?=?(x?+?y)?/ 2????return?mean_valuem?=?my_mean_2(0,?10)print(m)5.0O.1 Default ParametersFunctions can also have default values. In fact, many of the functions found in various libraries have default values. These defaults allow users to type less because users now have to specify just a minimal amount of information for the function, but also give users the flexibility to make changes to the function’s behavior if desired. Default values are also useful if you have your own functions and want to add more features without breaking your existing code.Click here to view code imagedef?my_mean_3(x, y, z=20):????"""A function with a parameter z that has a default value????"""????# you can also directly return values without having to create????# an intermediate variable????return?(x?+?y?+?z)?/ 3Here we?need?to specify only?x?and?y.print(my_mean_3(10,?15))15.0We can also specify?z?if we want to override its default value.print(my_mean_3(0,?50,?100))50.0O.2 Arbitrary ParametersSometimes function documentation includes the terms?*args?and?**kwargs. These stand for “arguments” and “keyword arguments,” respectively. They allow the function author to capture an arbitrary number of arguments into the function. They may also provide a means for the user to pass arguments into another function that is called within the current function.O.2.1 *argsLet’s write a more generic?mean?function that can take an arbitrary number of values.Click here to view code imagedef?my_mean(*args):????"""Calculate the mean for an arbitrary number of values????"""????# add up all the values????sum?= 0????for?i?in?args:????????sum?+=?i????return?sum?/?len(args)print(my_mean(0,?10))5.0print(my_mean(0,?50,?100))50.0print(my_mean(3,?10,?25,?2))10.0O.2.2 **kwargs**kwargs?is similar to?*args, but instead of acting like an arbitrary list of values, they are used like a dictionary—that is, they specify arbitrary pairs of key–value stores.Click here to view code imagedef?greetings(welcome_word,?**kwargs):????"""Prints out a greeting to a person,????where the person's fname and lname are provided by the kwargs????"""????print(welcome_word)????print(kwargs.get('fname'))????print(kwargs.get('lname'))greetings('Hello!', fname='Daniel', lnam='Chen')Hello!DanielChen Sequence FunctionsPython has a?handful of useful sequence functions that you should familiarize yourself with and use at any opportunity.enumerateIt’s common when?iterating over a sequence to want to keep track of the index of the current item. A do-it-yourself approach would look like:i = 0for value in collection: # do something with value i += 1Since this is so common, Python has a built-in function,?enumerate, which returns a sequence of?(i, value)?tuples:for i, value in enumerate(collection): # do something with valueWhen you are indexing data, a helpful pattern that uses?enumerate?is computing a?dict?mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence:In [84]: some_list = ['foo', 'bar', 'baz']In [85]: mapping = {}In [86]: for i, v in enumerate(some_list): ....: mapping[v] = iIn [87]: mappingOut[87]: {'bar': 1, 'baz': 2, 'foo': 0}sortedThe?sorted?function returns?a new sorted list from the elements of any sequence:In [88]: sorted([7, 1, 2, 6, 0, 3, 2])Out[88]: [0, 1, 2, 2, 3, 6, 7]In [89]: sorted('horse race')Out[89]: [' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']The?sorted?function accepts the same arguments as?the?sort?method on lists.zipzip?“pairs” up the elements?of a number of lists, tuples, or other sequences to create a list of tuples:In [90]: seq1 = ['foo', 'bar', 'baz']In [91]: seq2 = ['one', 'two', 'three']In [92]: zipped = zip(seq1, seq2)In [93]: list(zipped)Out[93]: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]zip?can take an arbitrary number of sequences, and the number of elements it produces is determined by the?shortest?sequence:In [94]: seq3 = [False, True]In [95]: list(zip(seq1, seq2, seq3))Out[95]: [('foo', 'one', False), ('bar', 'two', True)]A very common use of?zip?is simultaneously iterating over multiple sequences, possibly also combined with?enumerate:In [96]: for i, (a, b) in enumerate(zip(seq1, seq2)): ....: print('{0}: {1}, {2}'.format(i, a, b)) ....:0: foo, one1: bar, two2: baz, threeGiven a “zipped” sequence,?zip?can be applied in a clever way to “unzip” the sequence. Another way to think about this is converting a list of?rows?into a list of?columns. The syntax, which looks a bit magical, is:In [97]: pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ....: ('Curt', 'Schilling')]In [98]: first_names, last_names = zip(*pitchers)In [99]: first_namesOut[99]: ('Nolan', 'Roger', 'Curt')In [100]: last_namesOut[100]: ('Ryan', 'Clemens', 'Schilling')reversedreversed?iterates over?the elements of a sequence in reverse order:In [101]: list(reversed(range(10)))Out[101]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Keep in mind that?reversed?is a generator (to be discussed in some more detail later), so it does not create the reversed sequence until materialized (e.g.,?with?list?or a?for?loop).Advanced pandasBackground and MotivationFrequently, a column in a table may contain repeated instances of a smaller set of distinct values. We have already seen functions?like?unique?and?value_counts, which enable us to extract the distinct values from an array and compute their frequencies, respectively:In [12]: import numpy as np; import pandas as pdIn [13]: values = pd.Series(['apple', 'orange', 'apple', ....: 'apple'] * 2)In [14]: valuesOut[14]: 0 apple1 orange2 apple3 apple4 apple5 orange6 apple7 appledtype: objectIn [15]: pd.unique(values)Out[15]: array(['apple', 'orange'], dtype=object)In [16]: pd.value_counts(values)Out[16]: apple 6orange 2dtype: int64Many data systems (for data warehousing, statistical computing, or other uses) have developed specialized approaches for representing data with repeated values for more efficient storage and computation. In data warehousing, a best practice is to use so-called?dimension tables?containing the distinct values and storing the primary observations as integer keys referencing the dimension table:In [17]: values = pd.Series([0, 1, 0, 0] * 2)In [18]: dim = pd.Series(['apple', 'orange'])In [19]: valuesOut[19]: 0 01 12 03 04 05 16 07 0dtype: int64In [20]: dimOut[20]: 0 apple1 orangedtype: objectWe can use the?take?method?to restore the original Series of strings:In [21]: dim.take(values)Out[21]: 0 apple1 orange0 apple0 apple0 apple1 orange0 apple0 appledtype: objectThis representation as integers is called the?categorical?or?dictionary-encoded?representation. The array of distinct values can be called the?categories,?dictionary, or?levels?of the data. In this book we will use the terms?categorical?and?categories. The integer values that reference the categories are called the?category codes?or simply?codes.The categorical representation can yield significant performance improvements when you are doing analytics. You can also perform transformations on the categories while leaving the codes unmodified. Some example transformations that can be made at relatively low cost are:Renaming categoriesAppending a new category without changing the order or position of the existing categories ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download