EECS E6893 Big Data Analytics Tingyu Li, tl2861@columbia ...

EECS E6893 Big Data Analytics HW3: Twitter data analysis with Spark Streaming

Tingyu Li, tl2861@columbia.edu

10/04/2019

1

Spark Streaming



Dstream

represents a continuous stream of data a continuous series of RDDs

Architecture

Spark Streaming

request data

Socket

request data

Twitter API

Put streaming data

Spark Context

Read data

Write data

Google Storage

BigQuery

Register on Twitter Apps

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download