Eran Toch - GitHub Pages
Data Science in the Wild
Lecture 11: In-memory Parallel Processing in Spark
Eran Toch
Data Science in the Wild, Spring 2019
!1
The Scale of Big Data
Data Science in the Wild, Spring 2019
!2
Agenda
1. Spark 2. Spark DataFrames 3. Spark SQL 4. Machine Learning on Spark 5. ML Pipelines
Data Science in the Wild, Spring 2019
!3
Spark
Data Science in the Wild, Spring 2019
!4
Technological Architecture
In Memory Data Flow
Data Warehouse
NoSQL
Scripting Pig
Processing Storage
MapReduce / YARN
Hadoop Distributed File System (HDFS)
Data Science in the Wild, Spring 2019
!5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- spark create row with schema
- spark 3 answers
- spark programming spark sql big data
- laziness and actions tables hail index
- hhiivvee mmoocckk tteesstt iiii tutorialspoint
- dataframe abstraction ee
- cheat sheet for pyspark github
- interactive data analysis with r sparkr and mongodb a
- eran toch github pages
- dataframe and sql abstractions ee