Spark: Big Data processing framework
[Pages:88]Spark: Big Data processing framework
Troy Baer1, Edmon Begoli2,3, Cristian Capdevila2, Pragnesh Patel1, Junqi Yin1
1. National Institute for Computational Sciences, University of Tennessee 2. PYA Analytics
3. Joint Institute for Computational Sciences, University of Tennessee
XSEDE Tutorial, July 26, 2015
Outline
?Overview of Big Data processing framework ?Introduction to Spark and Spark deployment ?Introduction to Spark SQL and Streaming ?Hands-on ?Spark machine learning and graph libraries ?Hands-on
2
Spark@NICS/JICS, XSEDE 2015
Overview of Big Data processing framework
A brief history of Hadoop
4
Spark@NICS/JICS, XSEDE 2015
Why large scale (`Big Data') analytics
?Heterogeneity of the architecture
? Mixed loads ? Mixed types
?Comprehensiveness of the analytic tools
? SQL ? Machine Learning ? Data manipulation ? Programming ? External libraries
?"Safe-bet" for the future
5
Spark@NICS/JICS, XSEDE 2015
A general case for `Big Data' in healthcare
Image and sensor data
`Big Data' Platform
Personal/genomic data
Financial and administrative data
Clinical data
6
Value of data aggregation
V
a
Clinical
l
u
e
Financial
Financial
Patient Specific
Clinical
Financial
Public Patient Specific Clinical
Financial
Complexity and Size od Data Sets
7
Hadoop: Big Data Platform
?Hadoop has become the de-facto platform for storing and processing large amounts of data.
1. Storage managers : HDFS, HBASE, Kafka, etc. 2. Processing framework: MapReduce, Spark, etc. 3. Resource managers: Yarn, Mesos, etc.
8
Spark@NICS/JICS, XSEDE 2015
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- advanced data science on spark
- structured data processing spark sql
- spark big data processing framework
- 1 apache spark brigham young university
- lecture on mapreduce and spark asaf cidon
- pyspark sql s q l q u e r i e s intellipaat
- introduction to scala and spark sei digital library
- cheat sheet pyspark sql python lei mao s log book
- introduction to hadoop hive an d apache spark