GPU Accelerated Data Analytics in Python
[Pages:42]GPU Accelerated Data Analytics in Python
Mads R. B. Kristensen, NVIDIA
2
Scale up and out with RAPIDS and Dask
RAPIDS and Others
Accelerated on single GPU
NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba
Dask + RAPIDS
Multi-GPU On single Node (DGX) Or across a cluster
Scale Up / Accelerate
PyData
NumPy, Pandas, Scikit-Learn and many more
Single CPU core In-memory data
Dask
Multi-core and Distributed PyData
NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML ... -> Dask Futures
Scale out / Parallelize
3
Scale up and out with RAPIDS and Dask
RAPIDS and Others
Accelerated on single GPU
NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba
Scale Up / Accelerate
PyData
NumPy, Pandas, Scikit-Learn and many more
Single CPU core In-memory data
Scale out / Parallelize
4
CPU vs GPU
DOI: 10.1016/j.cam.2013.12.032.
5
Data Processing Evolution
Faster data access, less data movement
Hadoop Processing, Reading from disk
HDFS Read
Query
HDFS HDFS Write Read
Spark In-Memory Processing
ETL
HDFS HDFS Write Read
HDFS Read
Query
ETL
ML Train
Traditional GPU Processing
HDFS Read
GPU Read
Query
CPU Write
GPU Read
ETL
CPU GPU ML Write Read Train
5-10x Improvement More code
Language rigid Substantially on GPU
ML Train
25-100x Improvement Less code
Language flexible Primarily In-Memory
6
DDaattaa MMoovveemmeenntt aanndd TTrraannssffoorrmmaattiioonn What if we could keep data on the GPU?
APP B
Read Data
CPU
APP B Copy & Convert APP A
Copy & Convert Copy & Convert
APP B
GPU Data
GPU
GPU Data
APP A
APP A
Load Data
7
DDaattaa MMoovveemmeenntt aanndd TTrraannssffoorrmmaattiioonn What if we could keep data on the GPU?
APP B
Read Data
CPU
APP B Copy & Convert APP A
Copy & Convert Copy & Convert
APP B
GPU Data
GPU
GPU Data
APP A
APP A
Load Data
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- the platform inside and out release 0
- lecture 4 dask github pages
- the platform inside and out release 0 rapids docs
- comp 499 introduction to data analytics
- gpu accelerated data analytics in python
- geospatial analysis with high performance computing
- rapids open source python data science with gpu
- gpus for data science rapids
- release 0 12 the platform inside and out
Related searches
- data analytics certification
- data analytics software
- data analytics pdf
- data analytics free certification
- data analytics online courses
- data analytics research paper
- data analytics job description
- data analytics course
- data analytics certification online free
- online data analytics certificate program
- cornell data analytics certificate
- best data analytics certification