EECS E6893 Big Data Analytics Spark Dataframe, Spark SQL, Hadoop metrics

[Pages:29]EECS E6893 Big Data Analytics Spark Dataframe, Spark SQL, Hadoop metrics

Gudmundur Jonasson gmj2122@columbia.edu

9/29/2023

1

Agenda

Spark Dataframe Spark SQL Hadoop metrics

2

Spark Dataframe

An abstraction, an immutable distributed collection of data like RDD Data is organized into named columns, like a table in DB Create from RDD, Hive table, or other data sources Easy conversion with Pandas Dataframe

3

Spark Dataframe: read from csv file

4

Spark Dataframe: common operations

5

Spark Dataframe: common operations

6

Spark Dataframe: common operations

7

Spark Dataframe: common operations

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download