Big data tutorial w2 spark

EECS E6893 Big Data Analytics Spark Introduction

Qingcheng Yu, qy2281@columbia.edu

1

Agenda

Functional programming in Python

Lambda Expression

Crash course in Spark (PySpark)

Resilient Distributed Dataset (RDD) Useful RDD operations

Actions Transformations

Example: Word count

2

Functional programming in Python

3

Lambda expression

Creating small, one-time, anonymous function objects in Python Syntax: lambda arguments: expression

Any number of arguments Single expression

Could be used together with map, filter, reduce or inside another function

Example:

Add: add = lambda x, y : x + y

def add (x, y): return x + y

type(add) =

add(2,3)

4

Crash course in Spark

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download