The video above demonstrates one way to install Spark (PySpark) on Mac. The following instructions guide you through the installation process. You can either leave a comment here or leave me a comment on youtube (please subscribe if you can) if you have any questions!
Prerequisites: Anaconda. If you already have anaconda installed, skip to step 2.
- Download and install Anaconda. If you need help, please see this tutorial.
- Go to the Apache Spark website (link)
a) Choose a Spark release
b) Choose a package type
c) Choose a download type: (Direct Download)
d) Download Spark
2. Make sure you have java installed on your machine.
3. Go to your home directory (command in bold below)
4. Unzip the folder in your home directory using the following command.
tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz
5. Use the following command to see that you have a .bash_profile
6. Next, we will edit our .bash_profile so we can open a spark notebook in any directory.
7. Don’t remove anything in your .bash_profile. Only add the following
Notes: The PYSPARK_DRIVER_PYTHON parameter and the PYSPARK_DRIVER_PYTHON_OPTS parameter are used to launch the PySpark shell in Jupyter Notebook. The — master parameter is used for setting the master node address. Here we launch Spark locally on 2 cores for local testing.
8. Type the following into your terminal
Please let me know if you have any questions. You can also test your PySpark installation here!
Common issues: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
While the native hadoop library is supported on Linux type platforms only (The library does not to work with Cygwin or the Mac OS X platform), we have to make a workaround if it effects pyspark from working.
- Download hadoop binary (link, basically another file) and put it in your home directory
(you can choose a different hadoop version if you like and change the next steps accordingly)
2. Unzip the folder in your home directory using the following command.
tar -zxvf hadoop-2.8.0.tar.gz
3. Now add export HADOOP_HOME=~/hadoop-2.8.0 to your bash_profile. Open a new terminal and try again.