Intro to DataFrames and Spark SQL - GitHub Pages

[Pages:30]Intro to DataFrames and Spark SQL

Spark SQL

L L

33 / 48

Spark SQL

? Part of the core distribution since 1.0 (April 2014) ? Runs SQL / HiveQL queries, optionally alongside or

replacing existing Hive deployments

Improved multi-version support in 1.4

Spark SQL

34 / 48

Spark SQL

L

F DD

DD DD

35 / 48

Spark SQL

()

L

DF

= C . (' ELEC

ID

')

36 / 48

Spark SQL

%%

J

I

L

37 / 48

DataFrames API

? Enable wider audiences beyond "Big Data" engineers to leverage the power of distributed processing

? Inspired by data frames in R and Python (Pandas) ? Designed from the ground-up to support modern big

data and data science applications ? Extension to the existing RDD API

See

? ? blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-

data-science.html

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download