Spark SQL is the Spark component for structured data ...

Spark SQL is the Spark component for structured data processing

It provides a programming abstraction called Dataset and can act as a distributed SQL query engine

The input data can be queried by using

Ad-hoc methods Or an SQL-like language

2

The interfaces provided by Spark SQL provide more information about the structure of both the data and the computation being performed

Spark SQL uses this extra information to perform extra optimizations based on an "SQL-like" optimizer called Catalyst

=> Programs based on Datasets are usually faster than standard RDD-based programs

3

RDD

vs

Unstructured

Dataset/DataFrame

Structured

Distributed list of objects

~Distributed SQL table

4

Dataset

Distributed collection of structured data

It provides the benefits of RDDs

Strong typing Ability to use powerful lambda functions

And the benefits of Spark SQL's optimized execution engine exploiting the information about the data structure

Compute the best execution plan before executing the code

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Spark SQL is the Spark component for structured data ...

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches

Spark SQL is the Spark component for structured data ...

Spark sql functions list

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches