Introduction to Big Data with Apache Spark
Introduction to Big Data with Apache Spark
UC
BERKELEY
This Lecture
The Structure Spectrum
Files: Formats and Performance
Tabular Data: Examples, Challenges, pySpark DataFrames
Log Files
Review:The Big Picture
Extract
Transform
Load
Key Data Management Concepts
? A data model is a collection of concepts for describing data
? A schema is a description of a particular collection of data, using a given data model
The Structure Spectrum
Structured Semi-Structured Unstructured
(schema-first)
(schema-later)
(schema-never)
Relational Database
Formatted Messages
Documents XML
Tagged Text/Media
Plain Text
Media
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- introduction to big data with apache spark
- intro to dataframes and spark sql github pages
- magpie python at speed and scale using cloud backends
- cheat sheet pyspark sql python lei mao s log book
- eecs e6893 big data analytics hritik jain hj2533 columbia
- with pandas f m a vectorized m a f operations cheat sheet
- pyspark of warcraft europython
- improving python and spark performance and
- dataframes home ucsd dse mas
- interaction between sas and python for data handling and