HIGH PERFORMANCE NEXT GENERATION DEEP LEARNING CLUSTERS

[Pages:68]HIGH PERFORMANCE NEXT GENERATION DEEP LEARNING CLUSTERS

Julie Bernauer, 2020/05/21

AI EVERYWHERE

NVIDIA

2

DL TRAINING: FROM SINGLE GPU TO MULTI-NODE

36000 Mins (25 Days)

1xK80 | 2015 CUDA

2015

1200 Mins (20 Hours) DGX-1P | 2016 NVLink

6.3 Minutes on MLPerf At Scale | 2018 DGX Cluster

70 Minutes on MLPerf DGX-2H | 2018 NVSwitch

480 Mins (8 Hours) DGX-1V | 2017 Tensor Core

52.7 Minutes on MLPerf

DGX-2H | 2019 NVSwitch

1.33 Minutes on MLPerf

At Scale | 2019 DGX SuperPOD

2016

2017

2018

ResNet50 v1.5 training

2019

3

MODELS GETTING MORE COMPLEX

4

DATASETS GETTING LARGER

Unlabeled data:

Language model: BooksCorpus (800M words), English Wikipedia (2.5B words), WebText (8M documents, 40 GB), C4 (Common Crawl, 745 GB)

GAN: unlabeled images and videos Reinforcement learning: unsupervised self-play generates unlimited data

Labeled data:

ImageNet (2012) - 1.3M images, 1000 categories Open Images (2019) - 9M images, 6000 categories Semi-autonomous vehicles: 0.5-1.1TB of data for every 8h driving

5

AI EVERYWHERE

NVIDIA

6

NVIDIA DGX-2H SUPERPOD

An AI supercomputer

Highest-Performance AI Research Supercomputer - #20 on Top500 list | Top AI Performance Records - 96 DGX-2H nodes

Fast multi-node interconnect - Mellanox EDR 100G InfiniBand Network

AI Infrastructure - Modular and scalable architecture - Integrated and optimized compute, networking, storage, and

software NVIDIA-Optimized Software Stacks - Containers Freely available on NGC

7

CLUSTERS AT NVIDIA

A wide variety of daily uses for SaturnV

Supporting a wide community of users - supercomputer-scale continuous integration for

software - research - automotive - QA Need for performance at scale and flexibility

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download