1 Apache Spark - Brigham Young University

You can also use spark.createDataFrame() on numpy arraysandpandasDataFrames. DataFramescanbeeasilyupdated,queried,andanalyzedusingSQLoperations. Sparkallows you to run queries directly on DataFrames similar to how you perform transformations on RDDs. Additionally, the pyspark.sql.functions module contains many additional functions to further ... ................
................