Accelerating Data Science Workflows with RAPIDS
[Pages:53]HIGH-PERFORMANCE DATA SCIENCE WITH RAPIDS
Zahra Ronaghi
AI Infrastructure Manager
END-TO-END ACCELERATED GPU DATA SCIENCE
Data Processing Evolution
Faster data access, less data movement
Hadoop Processing, Reading from disk
HDFS Read
Query
HDFS HDFS Write Read
Spark In-Memory Processing
ETL
HDFS HDFS Write Read
HDFS Read
Query
ETL
ML Train
Traditional GPU Processing
HDFS Read
GPU Read
Query WCrPitUe
GPU Read
ETL
CPU Write
GPU ML Read Train
5-10x Improvement More code
Language rigid Substantially on GPU
ML Train
25-100x Improvement Less code
Language flexible Primarily In-Memory
3
Data Movement and Transformation
The bane of productivity and performance
APP B
Read Data
CPU
APP B
Copy & Convert APP A
Copy & Convert Copy & Convert
APP B
GPU Data
GPU Data
APP A
GPU
APP A
Load Data
4
Data Movement and Transformation
What if we could keep data on the GPU?
APP B
Read Data
CPU
APP B
Copy & Convert APP A
Copy & Convert Copy & Convert
APP B
GPU Data
GPU Data
APP A
GPU
APP A
Load Data
5
Learning from Apache Arrow
From Apache Arrow Home Page -
6
Data Processing Evolution
Faster data access, less data movement
Hadoop Processing, Reading from disk
HDFS Read
Query
HDFS HDFS Write Read
Spark In-Memory Processing
ETL
HDFS HDFS Write Read
HDFS Read
Query
ETL
ML Train
Traditional GPU Processing
HDFS Read
GPU Read
Query WCrPitUe
GPU Read
ETL
CPU Write
GPU ML Read Train
RAPIDS
Arrow Read
Query ETL
ML Train
5-10x Improvement More code
Language rigid Substantially on GPU
50-100x Improvement Same code
Language flexible Primarily on GPU
ML Train
25-100x Improvement Less code
Language flexible Primarily In-Memory
7
Open Source Data Science Ecosystem Familiar Python APIs
Data Preparation
Pandas Analytics
Dask
Scikit-Learn Machine Learning
Model Training
Visualization
NetworkX Graph Analytics
PyTorch Chainer MxNet Deep Learning
Matplotlib/Seaborn Visualization
CPU Memory
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- read the docs
- scalable machine learning with dask
- gpus for data science rapids
- accelerating data science workflows with rapids
- scale independent data analysis with database backed
- distributed gpu computing with dask
- release 0 12 the platform inside and out
- scaling rapids with dask nvidia
- magpie python at speed and scale using cloud backends