Analysis of unstructured data .wroc.pl
27.10.2017
3_pandas
Analysis of unstructured data
Lecture 3 - Introduction to pandas module (continued)
Janusz Szwabiski
Overview:
Iteration over data structures Sorting Working with text data Working with missing data Grouping of data Merge, join and concatenate Time series Visualization
References:
homepage of the Pandas project: ()
In [1]: %matplotlib inline import numpy as np import pandas as pd
Iteration over data structures
behavior of basic iteration over pandas objects depends on their type Series is regarded as array-like iteration produces values DataFrame (and Panel) follows the dict-like convention of iterating over the "keys" of the objects in short, basic iteration (for i in object:) produces:
Series: values DataFrame: column labels Panel: item labels
In [2]: df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
index=['a', 'b', 'c'])
1/68
27.10.2017
In [3]: df Out[3]:
col1
col2
a -0.320593 -0.205749
b -1.001097 0.730810
c -1.466919 0.784842
3_pandas
In [4]:
for col in df: print(col)
col1 col2
Pandas offers also some additional methods supporting iteration:
iteritems() - iterate over (key, value) pairs iterrows() - iterate over the rows of a dataframe as (index,Series) pairs itertuples() - iterate over the rows as named tuples of the values (lot faster than iterrows)
In [5]: df Out[5]:
col1
col2
a -0.320593 -0.205749
b -1.001097 0.730810
c -1.466919 0.784842
2/68
27.10.2017
In [6]:
for key, val in df.iteritems(): print("Key: ",key) print("Value:") print(val) print('-'*10)
Key: col1 Value: a -0.320593 b -1.001097 c -1.466919 Name: col1, dtype: float64 ---------Key: col2 Value: a -0.205749 b 0.730810 c 0.784842 Name: col2, dtype: float64 ----------
3_pandas
In [7]:
for ind, ser in df.iterrows(): #row becomes a Series of the name being its label print("Index: ",ind) print("Series:") print(ser) print('-'*10)
Index: a Series: col1 -0.320593 col2 -0.205749 Name: a, dtype: float64 ---------Index: b Series: col1 -1.001097 col2 0.730810 Name: b, dtype: float64 ---------Index: c Series: col1 -1.466919 col2 0.784842 Name: c, dtype: float64 ----------
3/68
27.10.2017
3_pandas
In [8]:
for tup in df.itertuples(): #row becomes a tuple print("Value:") print(tup) print('-'*10)
Value: Pandas(Index='a', col1=-0.32059302248966487, col2=-0.205748503728384 82) ---------Value: Pandas(Index='b', col1=-1.0010973282656557, col2=0.7308101928498936 8) ---------Value: Pandas(Index='c', col1=-1.4669194859822687, col2=0.7848419785019650 2) ----------
Warning #1
iterating through pandas objects is generally slow in many cases it is not needed and can be avoided with one of the following approaches:
look for a vectorized solution when you have a function that cannot work on the full DataFrame/Series at once, it is better to use apply() instead of iterating over the values if you need to do iterative manipulations on the values but performance is important, consider writing the inner loop using e.g. cython or numba ( ())
Warning #2
do not modify something you are iterating over usually the iterator returns a copy and not a view, and writing to it will have no effect
In [9]: df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
In [10]: for index, row in df.iterrows():
row['a'] = 10
4/68
27.10.2017
In [11]: df Out[11]:
ab 01a 12b 23c
3_pandas
Sorting
two kinds of sorting: by index and by value since the version 0.17.0 of pandas all sorting methods return a new object by default, and do not operate in-place this behavior can be changed by passing the flag inplace=True
Sorting by index
In [12]:
df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)}, index=['a', 'b', 'c'])
In [13]: df Out[13]:
col1
col2
a -0.617963 -0.239719
b 1.297180 0.406090
c -1.641579 0.737969
In [14]:
unsorted_df = df.reindex(index=['c', 'a', 'b'], columns=['col2', 'col1'])
5/68
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- 15 110 hw6 tweet analytics
- programming in python lecture xvii exception handling
- advanced tabular data processing with pandas
- python tutorial for cse 446 university of washington
- 5 traversing dataframe elements using
- lecture 11 computational efficiency
- pandas groupby iterate error getting images 1 pandas
- pandas under the hood
- analysis of unstructured data
- sas and python the perfect partners in crime
Related searches
- analysis of data procedure
- data analysis of research study
- analysis of data example
- example of data analysis what is data analysis in research
- structured and unstructured data examples
- analysis of qualitative data pdf
- structured vs unstructured data collection
- structured and unstructured data example
- analysis of quantitative data pdf
- structured data vs unstructured data examples
- unstructured data to structured data
- structured vs unstructured data examples