Introduction to Big Data with Apache Spark
Introduction to Big Data with Apache Spark
UC
BERKELEY
This Lecture
Structured Data and Relational Databases
The Structured Query Language (SQL)
SQL and pySpark Joins
Review: Key Data Management Concepts
? A data model is a collection of concepts for describing data
? A schema is a description of a particular collection of data, using a
given data model
Whither Structured Data?
? Conventional Wisdom:
?Only 20% of data is structured.
? Decreasing due to:
?Consumer applications
?Enterprise search
?Media applications
The Structure Spectrum
Structured Semi-Structured Unstructured
(schema-first)
(schema-later)
(schema-never)
This
lecture
Relational Database
Formatted Messages
Documents XML JSON
Tagged Te
xt/Media
Plain Text
Media
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- spark programming spark sql
- introduction to big data with apache spark
- pyspark sql cheat sheet python qubole
- getting started with apache spark big data and ai toronto
- spark walmart data analysis project exercise
- 1 introduction to apache spark brigham young university
- cheat sheet for pyspark github
- how to see the entire dataframe in python
- cheat sheet pyspark sql python lei mao s log book