Spark – Print contents of RDD - Tutorial Kart
Spark ? Print contents of RDD
Spark ? Print contents of RDD
RDD (Resilient Distributed Dataset) is a fault-tolerant collection of elements that can be operated on in parallel.
To print RDD contents, we can use RDD collect action or RDD foreach action.
RDD.collect() returns all the elements of the dataset as an array at the driver program, and using for loop on this array, we can print elements of RDD.
RDD foreach(f) runs a function f on each element of the dataset.
In this tutorial, we will go through examples with collect and foreach action in Java and Python.
RDD.collect() ? Print RDD ? Java Example
In the following example, we will write a Java program, where we load RDD from a text file, and print the contents of RDD to console using RDD.collect().
PrintRDD.java
import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; public class PrintRDD {
public static void main(String[] args) { // configure spark SparkConf sparkConf = new SparkConf().setAppName("Print Elements of RDD") .setMaster("local[2]").set("spark.executor.memory" // start a spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); // read text files to RDD JavaRDD lines = sc.textFile("data/rdd/input/file1.txt"); // collect RDD for printing for(String line:lines.collect()){ System.out.println("* "+line); }
} }
}
file1.txt
Welcome to TutorialKart Learn Apache Spark Learn to work with RDD
Output
18/02/10 16:31:33 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from 18/02/10 16:31:33 INFO DAGScheduler: ResultStage 0 (collect at PrintRDD.java:18) finished in 0.513 18/02/10 16:31:33 INFO DAGScheduler: Job 0 finished: collect at PrintRDD.java:18, took 0.726936 s * Welcome to TutorialKart * Learn Apache Spark * Learn to work with RDD 18/02/10 16:31:33 INFO SparkContext: Invoking stop() from shutdown hook 18/02/10 16:31:33 INFO SparkUI: Stopped Spark web UI at 18/02/10 16:31:33 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
RDD.collect() ? Print RDD ? Python Example
In the following example, we will write a Python program, where we load RDD from a text file, and print the contents of RDD to console using RDD.collect().
print-rdd.py
import sys from pyspark import SparkContext, SparkConf if __name__ == "__main__":
# create Spark context with Spark configuration conf = SparkConf().setAppName("Print Contents of RDD - Python") sc = SparkContext(conf=conf) # read input text file to RDD rdd = sc.textFile("data/rdd/input/file1.txt") # collect the RDD to a list list_elements = rdd.collect() # print the list for element in list_elements:
print(element)
Run this Python program from terminal/command-prompt as shown below.
$ spark-submit print-rdd.py
$ spark-submit print-rdd.py
Output
18/02/10 16:37:05 INFO DAGScheduler: ResultStage 0 (collect at /home/arjun/workspace/spark/readToRD 18/02/10 16:37:05 INFO DAGScheduler: Job 0 finished: collect at /home/arjun/workspace/spark/readToR This is File 1 Welcome to TutorialKart Learn Apache Spark Learn to work with RDD 18/02/10 16:37:05 INFO SparkContext: Invoking stop() from shutdown hook
RDD.foreach() ? Print RDD ? Java Example
In the following example, we will write a Java program, where we load RDD from a text file, and print the contents of RDD to console using RDD.foreach().
PrintRDD.java
import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.VoidFunction;
public class PrintRDD {
public static void main(String[] args) { // configure spark SparkConf sparkConf = new SparkConf().setAppName("Print Elements of RDD") .setMaster("local[2]").set("spark.executor.memory" // start a spark context JavaSparkContext sc = new JavaSparkContext(sparkConf);
// read text files to RDD JavaRDD lines = sc.textFile("data/rdd/input/file1.txt");
lines.foreach(new VoidFunction(){ public void call(String line) { System.out.println("* "+line);
}}); } }
RDD.foreach() ? Print RDD ? Python Example
In the following example, we will write a Java program, where we load RDD from a text file, and print the contents of RDD to console using RDD.foreach().
print-rdd.py
import sys from pyspark import SparkContext, SparkConf if __name__ == "__main__":
# create Spark context with Spark configuration conf = SparkConf().setAppName("Print Contents of RDD - Python") sc = SparkContext(conf=conf) # read input text file to RDD rdd = sc.textFile("data/rdd/input/file1.txt") def f(x): print(x) # apply f(x) for each element of rdd rdd.foreach(f)
Conclusion
In this Spark Tutorial ? Print Contents of RDD, we have learnt to print elements of RDD using collect and foreach RDD actions with the help of Java and Python examples.
Learn Apache Spark Apache Spark Tutorial Install Spark on Ubuntu Install Spark on Mac OS Scala Spark Shell - Example Python Spark Shell - PySpark Setup Java Project with Spark Spark Scala Application - WordCount Example Spark Python Application Spark DAG & Physical Execution Plan Setup Spark Cluster Configure Spark Ecosystem Configure Spark Application Spark Cluster Managers
Spark RDD
Spark RDD Spark RDD - Print Contents of RDD Spark RDD - foreach Spark RDD - Create RDD Spark Parallelize Spark RDD - Read Text File to RDD Spark RDD - Read Multiple Text Files to Single RDD Spark RDD - Read JSON File to RDD Spark RDD - Containing Custom Class Objects Spark RDD - Map Spark RDD - FlatMap Spark RDD - Filter Spark RDD - Distinct Spark RDD - Reduce
Spark Dataseet Spark - Read JSON file to Dataset Spark - Write Dataset to JSON file Spark - Add new Column to Dataset Spark - Concatenate Datasets
Spark MLlib (Machine Learning Library) Spark MLlib Tutorial KMeans Clustering & Classification Decision Tree Classification Random Forest Classification Naive Bayes Classification Logistic Regression Classification Topic Modelling
Spark SQL Spark SQL Tutorial Spark SQL - Load JSON file and execute SQL Query
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- spark print contents of rdd tutorial kart
- spark rdd map java python examples
- learning apache spark with python computer science software engineering
- apache spark guide cloudera
- spark python application tutorial kart
- what is apache spark github
- developing apache spark applications cloudera
- getting started with apache spark on azure databricks microsoft
- python spark shell pyspark tutorial kart
- spark read multiple text files to single rdd java python examples
Related searches
- print copy of nursing license
- contents of zambian bill 10
- contents of iv fluids
- powershell clear contents of file
- who buys contents of home
- copy all contents of directory cmd
- copy contents of directory cmd
- linux copy contents of folder
- spark print rdd
- powershell script contents of public folders
- powershell show contents of variable
- selling contents of home