Install Spark on Mac (PySpark)
--
The video above demonstrates one way to install Spark (PySpark) on Mac. The following instructions guide you through the installation process. You can either leave a comment here or leave me a comment on youtube (please subscribe if you can) if you have any questions!
Prerequisites: Anaconda. If you already have anaconda installed, skip to step 2.
- Download and install Anaconda. If you need help, please see this tutorial.
- Go to the Apache Spark website (link)
a) Choose a Spark release
b) Choose a package type
c) Choose a download type: (Direct Download)
d) Download Spark
2. Make sure you have java installed on your machine.
3. Go to your home directory (command in bold below)
cd ~
4. Unzip the folder in your home directory using the following command.
tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz
5. Use the following command to see that you have a .bash_profile
ls -a
6. Next, we will edit our .bash_profile so we can open a spark notebook in any directory.
nano .bash_profile
7. Don’t remove anything in your .bash_profile. Only add the following
Notes: The PYSPARK_DRIVER_PYTHON parameter and the PYSPARK_DRIVER_PYTHON_OPTS parameter are used to launch the PySpark shell in Jupyter Notebook. The — master parameter is used for setting the master node address. Here we launch Spark locally on 2 cores for local testing.
8. Type the following into your terminal
source .bash_profile
Please let me know if you have any questions. You can also test your PySpark installation here!
Common issues: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
While the native hadoop library is supported on Linux type platforms only (The library does not to work with Cygwin or the Mac OS X platform), we have to make a workaround if it effects pyspark from working.
- Download hadoop binary (link, basically another file) and put it in your home directory
(you can choose a different hadoop version if you like and change the next steps accordingly)
2. Unzip the folder in your home directory using the following command.
tar -zxvf hadoop-2.8.0.tar.gz
3. Now add export HADOOP_HOME=~/hadoop-2.8.0 to your bash_profile. Open a new terminal and try again.
Concluding Remarks
Please let me know if you have any questions! I am happy to answer questions in the comments section below or on the youtube video page, or through Twitter.