EECS E6893 Big Data Analytics Spark Dataframe, Spark SQL, Hadoop metrics

EECS E6893 Big Data Analytics Spark Dataframe, Spark SQL, Hadoop metrics

Tejasri Kurapati tk2928@columbia.edu

9/30/2022

1

Agenda

Spark Dataframe Spark SQL Hadoop metrics

2

Spark Dataframe

An abstraction, an immutable distributed collection of data like RDD Data is organized into named columns, like a table in DB Create from RDD, Hive table, or other data sources Easy conversion with Pandas Dataframe

3

Spark Dataframe: read from csv file

4

Spark Dataframe: common operations

5

Spark Dataframe: common operations

6

Spark Dataframe: common operations

7

Spark Dataframe: common operations

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download