Scaling RAPIDS with Dask - Nvidia
Scaling RAPIDS with Dask
Matthew Rocklin, Systems Software Manager GTC San Jose 2019
PyData is Pragmatic, but Limited
How do we accelerate an existing software stack?
The PyData Ecosystem ? NumPy: Arrays ? Pandas: Dataframes ? Scikit-Learn: Machine Learning ? Jupyter: Interaction ? ... (many other projects)
Is well loved ? Easy to use ? Broadly taught ? Community Governed
But sometimes slow ? Single CPU core ? In-memory data
2
95% of the time, PyData is great
(and you can ignore the rest of this talk)
5% of the time, you want more performance
3
Scale up and out with RAPIDS and Dask
RAPIDS and Others
Accelerated on single GPU
NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba
Dask + RAPIDS
Multi-GPU On single Node (DGX) Or across a cluster
Scale Up / Accelerate
PyData
NumPy, Pandas, Scikit-Learn and many more
Single CPU core In-memory data
Dask
Multi-core and Distributed PyData
NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML ... -> Dask Futures
Scale out / Parallelize
4
Scale up and out with RAPIDS and Dask
RAPIDS and Others
Accelerated on single GPU
NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba
Scale Up / Accelerate
PyData
NumPy, Pandas, Scikit-Learn and many more
Single CPU core In-memory data
Scale out / Parallelize
5
RAPIDS: GPU variants of PyData libraries
? NumPy -> CuPy, PyTorch, TensorFlow
? Array computing ? Mature due to deep learning boom ? Also useful for other domains ? Obvious fit for GPUs
? Pandas -> cuDF
? Tabular computing ? New development ? Parsing, joins, groupbys ? Not an obvious fit for GPUs
? Scikit-Learn -> cuML
? Traditional machine learning ? Somewhere in between
6
RAPIDS: GPU variants of PyData libraries
? NumPy -> CuPy, PyTorch, TensorFlow
? Array computing ? Mature due to deep learning boom ? Also useful for other domains ? Obvious fit for GPUs
? Pandas -> cuDF
? Tabular computing ? New development ? Parsing, joins, groupbys ? Not an obvious fit for GPUs
? Scikit-Learn -> cuML
? Traditional machine learning ? Somewhere in between
7
RAPIDS: GPU variants of PyData libraries
? NumPy -> CuPy, PyTorch, TensorFlow
? Array computing ? Mature due to deep learning boom ? Also useful for other domains ? Obvious fit for GPUs
? Pandas -> cuDF
? Tabular computing ? New development ? Parsing, joins, groupbys ? Not an obvious fit for GPUs
? Scikit-Learn -> cuML
? Traditional machine learning ? Somewhere in between
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- read the docs
- scalable machine learning with dask
- gpus for data science rapids
- accelerating data science workflows with rapids
- scale independent data analysis with database backed
- distributed gpu computing with dask
- release 0 12 the platform inside and out
- scaling rapids with dask nvidia
- magpie python at speed and scale using cloud backends
Related searches
- atv dealers grand rapids mi
- grand rapids scooter dealers
- plainfield honda grand rapids mi
- grand rapids can am dealer
- grand rapids scooter dealer
- grand rapids powersports
- resolution scaling calculator
- nvidia hdmi scaling
- image scaling calculator
- windows 10 scaling 4k
- dask dataframe to pandas dataframe
- dask dataframe from dictionary