Hervé Mignot EQUANCY
Modern pandas
Herv? Mignot EQUANCY
1
Building Pipelines with Python
Data Size
x100 M
x10 M x1 M x100 K
PySpark
Vaex*
Distributed Machine Learning
Dask | Pandas on Zak
Simple Single process Simple steps
Pandas
Intermediate ? Few processes ? Complex Steps
* See the slides presented at PyParis 2018 here:
Airflow Luigi
Complex ? Many processes ? Complex Steps
Pipeline Complexity
2
Our tools
Using pandas to build data transformation pipelines
()
Method Chaining
Brackets
lambda
3
Full credits to Tom Augspurger (@TomAugspurger)
Effective Pandas
Effective Pandas Method Chaining Indexes Fast Pandas
Tidy Data Visualization Time Series
4
Modern Pandas ? Method Chaining
Method chaining is composing functions application over an object.
Many data libraries API inspired from this functional programming pattern: ? dplyr (R) ? Apache Spark (Scala, Python, R) ?...
Example (reading a csv file, renaming a column, taking the first 6 rows into a pandas dataframe) :
df = pd.read_csv('myfile.csv').rename(columns={'old_col': 'new_col',}).head(6)
vs.
df = pd.read_csv('myfile.csv') df = df.rename(columns={'old_col': 'new_col',}) df = df.head(6)
6
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- tidy data a foundation for wrangling in pandas ingesting
- data wrangling tidy data pandas
- python pandas quick guide university of utah
- hervé mignot equancy
- python class room diary be easy in my python class
- pandas methods to read data are all named read to
- associate business analyst rathinam college
- assumption university
- with pandas f m a f ma vectorized a f operations cheat
- error handling pandas and data analysis