Install Spark on Mac (PySpark)

2 min readJan 3, 2017

The video above demonstrates one way to install Spark (PySpark) on Mac. The following instructions guide you through the installation process. You can either leave a comment here or leave me a comment on youtube (please subscribe if you can) if you have any questions!

Prerequisites: Anaconda. If you already have anaconda installed, skip to step 2.

Download and install Anaconda. If you need help, please see this tutorial.
Go to the Apache Spark website (link)

a) Choose a Spark release

b) Choose a package type

c) Choose a download type: (Direct Download)

d) Download Spark

2. Make sure you have java installed on your machine.

3. Go to your home directory (command in bold below)

cd ~

4. Unzip the folder in your home directory using the following command.

tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz

5. Use the following command to see that you have a .bash_profile

ls -a

6. Next, we will edit our .bash_profile so we can open a spark notebook in any directory.

nano .bash_profile

7. Don’t remove anything in your .bash_profile. Only add the following

Notes: The PYSPARK_DRIVER_PYTHON parameter and the PYSPARK_DRIVER_PYTHON_OPTS parameter are used to launch the PySpark shell in Jupyter Notebook. The — master parameter is used for setting the master node address. Here we launch Spark locally on 2 cores for local testing.

8. Type the following into your terminal

source .bash_profile

Please let me know if you have any questions. You can also test your PySpark installation here!

Common issues: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

While the native hadoop library is supported on Linux type platforms only (The library does not to work with Cygwin or the Mac OS X platform), we have to make a workaround if it effects pyspark from working.

Download hadoop binary (link, basically another file) and put it in your home directory

(you can choose a different hadoop version if you like and change the next steps accordingly)

2. Unzip the folder in your home directory using the following command.

tar -zxvf hadoop-2.8.0.tar.gz

3. Now add export HADOOP_HOME=~/hadoop-2.8.0 to your bash_profile. Open a new terminal and try again.

Concluding Remarks

Please let me know if you have any questions! I am happy to answer questions in the comments section below or on the youtube video page, or through Twitter.

Install Spark on Mac (PySpark)

Common issues: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Concluding Remarks

Written by Michael Galarnyk