DataFrame and SQL abstractions .ee

lines = spark.read.text(input_folder) #Split the value column into words and explode the resulting list into multiple records, Explode and split are column functions words = lines.select(explode(split( lines.value, " ")).alias("word")) #group by Word and apply count function wordCounts = words.groupBy("word").count() #print out the results ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download