RAPIDS:OPEN SOURCE PYTHON DATA SCIENCE WITH GPU ...

[Pages:55]RAPIDS:OPEN SOURCE PYTHON DATA SCIENCE WITH GPU ACCELERATION AND DASK

Joe Eaton, Sept 24, 2019 Principal Sys Engineer for Graph and Data Analytics, NVIDIA

RAPIDS

End-to-End Accelerated GPU Data Science

Data Preparation

cuDF cuIO Analytics

Dask

cuML Machine Learning

Model Training

Visualization

cuGraph Graph Analytics

PyTorch Chainer MxNet Deep Learning

cuXfilter pyViz Visualization

GPU Memory

2

Data Processing Evolution

Faster data access, less data movement

Hadoop Processing, Reading from disk

HDFS Read

Query

HDFS HDFS Write Read

Spark In-Memory Processing

ETL

HDFS HDFS Write Read

HDFS Read

Query

ETL

ML Train

Traditional GPU Processing

HDFS Read

RGePaUd Query WCrPiUte

GPU Read

ETL

CPU Write

GPU ML Read Train

5-10x Improvement More code

Language rigid Substantially on GPU

RAPIDS

Arrow Read

Query

ETL

ML Train

50-100x Improvement Same code

Language flexible Primarily on GPU

ML Train

25-100x Improvement Less code

Language flexible Primarily In-Memory

3

Faster Speeds, Real-World Benefits

cuIO/cuDF ? Load and Data Preparation

cuML - XGBoost

End-to-End

8762 6148 3925 3221 322 213

Time in seconds (shorter is better)

cuIO/cuDF (Load and Data Prep)

Data Conversion

XGBoost

Benchmark

200GB CSV dataset; Data prep includes joins, variable transformations

CPU Cluster Configuration

CPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark v2.3, XGBoost 0.9

DGX Cluster Configuration 5x DGX-1 on InfiniBand network,Ubuntu 16.04, CUDA 10, Driver 410.48, NCCL 2.4.7

4

RAPIDS Core

5

Open Source Data Science Ecosystem Familiar Python APIs

Data Preparation

Pandas Analytics

Dask

Scikit-Learn Machine Learning

Model Training

Visualization

NetworkX Graph Analytics

PyTorch Chainer MxNet Deep Learning

Matplotlib/Seaborn Visualization

CPU Memory

6

RAPIDS

End-to-End Accelerated GPU Data Science

Data Preparation

cuDF cuIO Analytics

Dask

cuML Machine Learning

Model Training

Visualization

cuGraph Graph Analytics

PyTorch Chainer MxNet Deep Learning

cuXfilter pyViz Visualization

GPU Memory

7

Dask

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download