Photo by Eric Han on Unsplash

PySpark on macOS: installation and use

René-Jean Corneille
Published in
4 min readOct 21, 2019

--

Spark is a very popular framework for data processing. It has slowly taken over the use of Hadoop for data analytics. In memory processing can yield up to 100x speed compared to Hadoop and MapReduce. One of the main advantages of Spark is that no more need to write map reduce jobs. Moreover, the spark engine is compatible with a large number of data sources (txt, json, xml, sql and nosql data stores). Spark is with…

--

--

René-Jean Corneille

Director of ML. I write about data science, mlops, python and sometimes C++