Intro To Spark - PSC

Spark Formula 1. Create/Load RDD Webpage visitor IP address log 2. TransformRDD ”Filter out all non-U.S. IPs” 3. But don’t do anything yet! Wait until data is actually needed Maybe apply more transforms (“Distinct IPs) 4. Perform Actionsthat return data Count “How many unique U.S. visitors?” ................
................