Spark - Read JSON file to RDD - Example

Spark ? Read JSON file to RDD ? Example

Spark ? Read JSON file to RDD

JSON has become one of the most common data format that is being exchanged between nodes in internet and applications.

In this tutorial, we shall learn how to read JSON file to an RDD with the help of SparkSession, DataFrameReader and DataSet.toJavaRDD().

Steps to Read JSON file to Spark RDD

To read JSON file Spark RDD,

1. Create a SparkSession. SparkSession spark = SparkSession .builder() .appName("Spark Example - Write Dataset to JSON File") .master("local[2]") .getOrCreate();

2. Get DataFrameReader of the SparkSession.spark.read() 3. Use DataFrameReader.json(String jsonFilePath) to read the contents of JSON to

Dataset.spark.read().json(jsonPath) 4. Use Dataset.toJavaRDD() to convert Dataset to

JavaRDD.spark.read().json(jsonPath).toJavaRDD()

Example : Spark ? Read JSON file to RDD

Following is a Java Program to read JSON file to Spark RDD and print the contents of it.

employees.json

{"name":"Michael", "salary":3000} {"name":"Andy", "salary":4500} {"name":"Justin", "salary":3500} {"name":"Berta", "salary":4000} {"name":"Raju", "salary":3000}

JSONtoRDD.java

JSONtoRDD.java

import org.apache.spark.api.java.JavaRDD; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession;

public class JSONtoRDD { public static void main(String[] args) { // configure spark SparkSession spark = SparkSession .builder() .appName("Spark Example - Read JSON to RDD") .master("local[2]") .getOrCreate();

// read list to RDD String jsonPath = "data/employees.json"; JavaRDD items = spark.read().json(jsonPath).toJavaRDD();

items.foreach(item -> { System.out.println(item);

}); } }

Output

[Michael,3000] [Andy,4500] [Justin,3500] [Berta,4000] [Raju,3000]

Conclusion

In this Spark Tutorial, we have learnt to read JSON file to Spark RDD with the help of an example Java program.

Learn Apache Spark

Apache Spark Tutorial Install Spark on Ubuntu Install Spark on Mac OS Scala Spark Shell - Example Python Spark Shell - PySpark Setup Java Project with Spark Spark Scala Application - WordCount Example

Spark Python Application Spark DAG & Physical Execution Plan Setup Spark Cluster Configure Spark Ecosystem Configure Spark Application Spark Cluster Managers

Spark RDD Spark RDD Spark RDD - Print Contents of RDD Spark RDD - foreach Spark RDD - Create RDD Spark Parallelize Spark RDD - Read Text File to RDD Spark RDD - Read Multiple Text Files to Single RDD Spark RDD - Read JSON File to RDD Spark RDD - Containing Custom Class Objects Spark RDD - Map Spark RDD - FlatMap Spark RDD - Filter Spark RDD - Distinct Spark RDD - Reduce

Spark Dataseet Spark - Read JSON file to Dataset Spark - Write Dataset to JSON file Spark - Add new Column to Dataset Spark - Concatenate Datasets

Spark MLlib (Machine Learning Library) Spark MLlib Tutorial KMeans Clustering & Classification Decision Tree Classification Random Forest Classification

Random Forest Classification Naive Bayes Classification Logistic Regression Classification Topic Modelling

Spark SQL Spark SQL Tutorial Spark SQL - Load JSON file and execute SQL Query

Spark Others Spark Interview Questions

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download