Lixiangcx.files.wordpress.com



Python for Data science (DataCamp)ch1: Python basics print()type() : int, float, str, bool# In python, double “ ” and single quotes ‘ ’ have identical functionality, unlike PHP or Bash# In [16]: 2 + 3 Out[16]: 5 In [17]: 'ab' + 'cd' Out[17]: 'abcd'help(function): open up documentationch2: python list [a, b, c]contain any typecontain different typesfam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]fam2 = [[ "liz", 1.73], ["emma", 1.68], ["mom", 1.71], ["dad", 1.89]]In [13]: type(fam) Out[13]: list In [14]: type(fam2) Out[14]: listSubsetting listsfam[3]: 1.68fam[-1]: 1.89fam[3:5]: [1.68, ‘mom’] [ start : end ] inclusive exclusivefam[:4]: ['liz', 1.73, 'emma', 1.68]Adding elements: fam + ["me", 1.79]Delete elements: del(fam[2])# Note: list is not primary type, so if y=x, y is referred to x. Any change to y will also change x.ch3: Functions & Packagesmax(fam) % Maximum of listlen() % Length of list or string:round() % round(number [ , ndigits]) Round a number to a given precision in decimal digits (default 0).round(1.68, 1) 1.7list.count() method counts how many times an element has occurred in a list and returns it.fam.append("me")MethodsEverything = object Object have methods associated, depending on typech4: NumPyList recap: powerful, collection o f different types, change/add/removeBut lack of mathematical operations over collections, and speedfor example:Solution: NumPyNumeric Python;Alternative to Python list: NumPy Array;Calculations over entire arraysEasy and FastInstallation in the terminal: pip3 install numpyNumPy methods: np.mean, np.median, np.corrcoef, np.std, np.sort, np.sumintermediate_ch1: MatplotlibFunctions: Visualization; Data structures; Control structures; Case studypy.scatter(x, y)help(plt.hist)plt.xlabel(‘Year’)plt.title(‘…’)plt.yticks([0, 2, 4, 6, 8], [‘Germany’, ‘Dutch’, ‘China’, ‘US’, ‘UK’])intermediate_ch2: Dictionaries & Pandasdict_name [ key ]result: valueDictionaries can contain key: value pairs where the values are again dictionaries.europe = { 'spain': { 'capital':'madrid', 'population':46.77 }, 'france': { 'capital':'paris', 'population':66.03 }, 'germany': { 'capital':'berlin', 'population':80.62 }, 'norway': { 'capital':'oslo', 'population':5.084 } }# Print out the capital of Franceprint(europe['france']['capital'])# Create sub-dictionary datadata={'capital':'rome', 'population':59.83}# Add data to europe under key 'italy'europe['italy']=dataprint(europe)PandasPandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.Index and select DataSquare bracketsAdvanced methodsloc, iloc# Note: The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.loc?and? HYPERLINK "" \l "different-choices-for-indexing" \t "_blank" iloc allow you to select both rows and columns from a DataFrame. # Note: about differences between Pandas series and Dataframepandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels.?Can be thought of as a dict-like container for Series objects.?The primary pandas data structureSo the Series is the datastructure for a single column of a?DataFrame, not only conceptually, but literally i.e. the data in a?DataFrame?is actually stored in memory as a collection of?Series.: Comparison OperatorsComparison Operators: how python values relate<, >, <=, >=, ==, !=Boolean Operators: and, or, not# Note: when dealing with numpy array, use np.logical_or/and/not(logic_array1, logic_array2) on element-wise comparisonConditional Statements: if condition :expression 1: elif condition:expression 2: else :expression 3: Filtering Pandas DataFrame:Example-1 Compare: select contries with are over 8 million km2Example-2 Boolean operators: also numpy.logical_and/or/not()intermediate_ch4: While loopwhile: repeat action until condition is met:while condition :expressionfor loop: for each var in seq, execute expressionfor var in seq :expressionenumerate(obj): iterator for index, value of iterableLoop over stringLoop over Dictionary: dict.items()loop over Numpy arrays: np.nditer(obj)loop over DataFrame: my_pandas_dataframe.iterrows()Pandas method: apply(function): apply functionsRecap: Dictionary: for key, val in dict.items() :Numpy array: for var in np.nditer(my_array) :DataFrame: for lab, row in my_pandas_dataframe.iterrows() :intermediate_ch5: Random Numbersimport numpy as npnp.random.seed(num)np.random.rand() # random float from 0-1np.random.randint(start, end)Throw coins 10 times, count number of times tails appeared, store this number in final_tails list. Repeat 100 times. == & is==?is for?value equality. Use it when you would like to know if two objects have the same value.is?is for?reference equality. Use it when you would like to know if two references refer to the same object.In general, when you are comparing something to a simple type, you are usually checking for?value equality, so you should use?==. For example, the intention of your example is probably to check whether x has a value equal to 2 (==), not whether?x?is literally referring to the same object as 2.>>> a = 500>>> b = 500>>> a == bTrue>>> a is bFalse>>> a = [1, 2, 3]>>> b = a>>> b is a True>>> b == aTrue>>> b = a[:]>>> b is aFalse>>> b == aTrue ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download