Netflix: Integrating Spark At Petabyte Scale
[Pages:52]Netflix: Integrating Spark At Petabyte Scale
Ashwin Shankar Cheolsoo Park
Outline
1. Netflix big data platform 2. Spark @ Netflix 3. Multi-tenancy problems 4. Predicate pushdown 5. S3 file listing 6. S3 insert overwrite 7. Zeppelin, Ipython notebooks 8. Use case (Pig vs. Spark)
Netflix Big Data Platform
Netflix data pipeline
Cloud Apps
Event Data
Suro/Kafka
Ursula
500 bn/day, 15m
S3 Dimension Data
Cassandra
SSTables
Aegisthus
Daily
Netflix big data platform
Tools
Big Data API/Portal
Service
Metacat
Clients
Clusters
Data Warehouse
Prod Prod Test
Adhoc
Gateways
Prod
Test
Our use cases
? Batch jobs (Pig, Hive)
? ETL jobs ? Reporting and other analysis
? Interactive jobs (Presto) ? Iterative ML jobs (Spark)
Spark @ Netflix
Mix of deployments
? Spark on Mesos
? Self-serving AMI ? Full BDAS (Berkeley Data Analytics Stack) ? Online streaming analytics
? Spark on YARN
? Spark as a service ? YARN application on EMR Hadoop ? Offline batch analytics
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- create global temporary table teradata
- eecs e6893 big data analytics tingyu li tl2861 columbia
- netflix integrating spark at petabyte scale
- apache spark github pages
- building robust etl pipelines with apache spark
- three practical use cases with azure databricks
- cheat sheet pyspark sql python lei mao s log book
- postgres 10 ways to load data into
- pyspark sql s q l q u e r i e s intellipaat
Related searches
- integrating technology in the classroom
- integrating curriculum in the classroom
- small scale business at home
- spark dataframe example
- spark sql documentation
- spark sql example
- spark dataframes tutorial
- spark sql reference
- apache spark documentation
- apache spark docs
- apache spark download
- install apache spark on windows