Advanced Data Science on Spark
Advanced Data Science on Spark
Reza Zadeh
@Reza_Zadeh |
Data Science Problem
Data growing faster than processing speeds
Only solution is to parallelize on large clusters
?Wide use in both enterprises and web industry
How do we program these things?
Use a Cluster
Convex Optimization
Matrix Factorization
Machine Learning
Numerical Linear Algebra
Large Graph analysis
Streaming and online algorithms
Following
lectures
on
Slides
at
Outline
Data Flow Engines and Spark
The Three Dimensions of Machine Learning
Built-in Libraries
MLlib + {Streaming, GraphX, SQL}
Future of MLlib
Traditional Network Programming
Message-passing between nodes (e.g. MPI)
Very difficult to do at scale:
?How to split problem across nodes?
? Must consider network & data locality
?How to deal with failures? (inevitable at scale)
?Even worse: stragglers (node not failed, but slow)
?Ethernet networking not fast
?Have to write programs for each machine
Rarely used in commodity datacenters
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- advanced data science on spark
- structured data processing spark sql
- spark big data processing framework
- 1 apache spark brigham young university
- lecture on mapreduce and spark asaf cidon
- pyspark sql s q l q u e r i e s intellipaat
- introduction to scala and spark sei digital library
- cheat sheet pyspark sql python lei mao s log book
- introduction to hadoop hive an d apache spark
Related searches
- free data science courses online
- best data science certification
- data science vs data analysis
- best data science graduate programs
- data science book pdf download
- data science vs analyst
- masters in data science berkeley
- data science harvard
- data science field of study
- data science benefits
- data science definition
- data science terms