Introduction to Big Data with Apache Spark
Introduction to Big Data with Apache Spark
UC
BERKELEY
This Lecture
The Structure Spectrum
Files: Formats and Performance
Tabular Data: Examples, Challenges, pySpark DataFrames
Log Files
Review:The Big Picture
Extract
Transform
Load
Key Data Management Concepts
? A data model is a collection of concepts for describing data
? A schema is a description of a particular collection of data, using a given data model
The Structure Spectrum
Structured Semi-Structured Unstructured
(schema-first)
(schema-later)
(schema-never)
Relational Database
Formatted Messages
Documents XML
Tagged Text/Media
Plain Text
Media
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- with pandas f m a vectorized m a f operations cheat sheet
- convert rdd to dataframe pyspark without schema
- spark datafrem print schema
- introduction to big data with apache spark
- cheat sheet pyspark sql python lei mao s log book
- delta lake cheatsheet databricks
- pyarrow documentation
- the platform inside and out release 0 rapids docs
- log analysis example databricks
- research project report spark blinkdb and sampling