Intro to Apache Spark - University of California, Berkeley

Intro to Apache Spark

Paco Nathan, @pacoid download slides:

Licensed under a Creative Commons AttributionNonCommercial-NoDerivatives 4.0 International License

Lecture Outline:

? login and get started with Apache Spark on Databricks Cloud

? understand theory of operation in a cluster

? a brief historical context of Spark, where it

fits with other Big Data frameworks

? coding exercises: ETL,WordCount, Join, Workflow

? tour of the Spark API

? follow-up: certification, events, community

resources, etc.

2

Getting Started

Getting Started:

Everyone will receive a username/password for one of the Databricks Cloud shards:

?

?

Run notebooks on your account at any time throughout the duration of the course. The accounts will be kept open afterwards, long enough to save/export your work.

4

Getting Started:

Workspace/databricks-guide/01 Quick Start Open in a browser window, then follow the discussion of the notebook key features:

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download