A journey from Pandas to Spark Data Frames

comparison Pandas vs. Apache Spark While running multiple merge queries for a 100 million rows data frame, pandas ran out of memory. An Apache Spark data frame, on the other hand, did the same operation within 10 seconds. Since the Pandas dataframe is not distributed, processing in the Pandas dataframe will be slower for a large amount of data. ................
................