732A54/TDDE31 Big Data Analytics - LiU
732A54/TDDE31 Big Data Analytics
Introduction of Spark SQL
updated: 2020-04-20
2
DataFrames
A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs.
3
SQLContext & HiveContext
? Start with obtaining SparkContext object and then SQLContext from it
sc = SparkContext() sqlContext = SQLContext(sc)
? HiveContext provides additional features to SQLContext (likely not needed for the lab assignment)
from pyspark.sql import HiveContext sqlContext = HiveContext(sc)
4
Imports
Don't forget to import relevant classes first!
from pyspark import SparkContext from pyspark.sql import SQLContext, Row from pyspark.sql import functions as F
5
Create a DataFrame from a RDD
? Two ways: ? Inferring the schema using reflection ? Specifying the schema programatically
? Then register the table
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- big data analytics with hadoop and spark at osc
- three practical use cases with azure databricks
- running apache spark applications cloudera
- unified data access with spark sql
- spark sql edu
- bootstrapping big data with spark sql and data frames
- spark sql tutorialspoint
- pyspark of warcraft europython
- advanced analytics with sql and mllib
- data import databricks
Related searches
- data analytics certification
- data analytics software
- data analytics pdf
- data analytics free certification
- data analytics online courses
- data analytics research paper
- data analytics job description
- data analytics course
- data analytics certification online free
- online data analytics certificate program
- cornell data analytics certificate
- best data analytics certification