EECS E6893 Big Data Analytics Spark Dataframe, Spark SQL, Hadoop metrics

EECS E6893 Big Data Analytics

Spark Dataframe, Spark SQL, Hadoop metrics

Gudmundur Jonasson gmj2122@columbia.edu

9/29/2023

1

Agenda

¡ñ Spark Dataframe

¡ñ Spark SQL

¡ñ Hadoop metrics

2

Spark Dataframe

¡ñ

¡ñ

¡ñ

¡ñ

An abstraction, an immutable distributed collection of data like RDD

Data is organized into named columns, like a table in DB

Create from RDD, Hive table, or other data sources

Easy conversion with Pandas Dataframe

3

Spark Dataframe: read from csv file

4

Spark Dataframe: common operations

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download