Speed Up Your Data Processing Parallel and Asynchronous ...

Speed Up Your Data Processing

Parallel and Asynchronous Programming in

Data Science

By: Chin Hwee Ong (@ongchinhwee)

23 July 2020

About me

Ong Chin Hwee Íõ¾´»Ý

¡ñ Data Engineer @ ST Engineering

¡ñ Background in aerospace

engineering + computational

modelling

¡ñ Contributor to pandas 1.0 release

¡ñ Mentor team at BigDataX

@ongchinhwee

A typical data science workflow

1.

2.

3.

4.

Extract raw data

Process data

Train model

Evaluate and deploy model

@ongchinhwee

Bottlenecks in a data science project

¡ñ Lack of data / Poor quality data

¡ñ Data processing

¡ð The 80/20 data science dilemma

¡ö In reality, it¡¯s closer to 90/10

@ongchinhwee

Data Processing in Python

¡ñ For loops in Python

¡ð Run on the interpreter, not compiled

¡ð Slow compared with C

a_list = []

for i in range(100):

a_list.append(i*i)

@ongchinhwee

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download