Install Spark/PySpark on Mac and Fix of Some Common Errors

Angie Li
Angie Li
Apr 4, 2018 · 4 min read

I’ve been working on a big data project which is about analyzing real-time system logs to classify patterns and errors. Spark, in this case, is particularly helpful since it is compatible with streaming data, and most importantly, it can be used with Python. I’ve successfully installed Spark on my local machine but I’ve also faced some errors during the installation. So in this post, I would like to share my installation steps and my methods of fixing some of the common errors that you might also encounter.

I followed most of the steps (open Jupyter Notebook by calling and initializing pyspark) from Michael Galarnyk’s post Install Spark on Mac (PySpark). But I’ve shortened the installation part by using Homebrew.

The errors (just to give you a sneak peak before I share the fixes):

  • Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
  • Java gateway process exited before sending the driver its port number

You could also check my Github repo here.

1. Install Spark/PySpark

The Spark installation also requires specific version of Java (java 8), but we can also install it using Homebrew.

  1. open terminal, enter $ brew install apache-spark
  2. once you see this error message, enter $ brew cask install caskroom/versions/java8 to install Java8
Image for post
Image for post

3. check if pyspark is properly install by $ pyspark, you should see something like this, and it means you are all set installing Spark:

Image for post
Image for post

2. Open Jupyter Notebook with PySpark Ready

Jupyter Notebook is a very convenient tool to write and save codes, so in this post, I will share the steps of how to create a global profile in order to create Jupyter Notebook automatically initialized with SparkContext.
In order to create a global profile for your terminal session, you will need to create or modify your .bash_profile or .bashrc file. Here, I will use .bash_profile as my example

  1. Check if you have .bash_profile in your system $ ls -a, if you don't have one, create one using $ touch ~/.bash_profile
  2. If you already have a .bash_profile, open it by $ vim ~/.bash_profile, press I in order to insert, and paste the following codes in any location (DO NOT delete anything in your file):
export SPARK_PATH=~/spark-1.6.0-bin-hadoop2.6 
export PYSPARK_DRIVER_PYTHON="jupyter"
#For python 3, You have to add the line below or you will get an error
# export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'

(credit to Michael Galarnyk)
3. Press ESC to exit insert mode, enter :wq to exit VIM. You could fine more VIM commands here
4. Refresh terminal profile by $ source ~/.bash_profile
5. You should be able to open Jupyter Notebook simply by calling $ pyspark
6. To check if your notebook is initialized with SparkContext, you could try the following codes in your notebook, or you could check my notebook here:

sc = SparkContext.getOrCreate()
import numpy as np
TOTAL = 1000000
dots = sc.parallelize([2.0 * np.random.random(2) - 1.0 for i in range(TOTAL)]).cache()
print("Number of random points:", dots.count())
stats = dots.stats()
print('Mean:', stats.mean())
print('stdev:', stats.stdev())

3. Common Errors

This error seems to be quite common for people who are trying to install Hadoop. Basically it means you are running Hadoop on 64bit OS wile Hadoop library is only compiled on 32bit OS. I also had this error and tried several methods, it seems I still have the error but after I did the above steps to call Jupyter Notebook, the error is gone and it didn’t have any impact on using SparkConext in the Jupyter Notebook. If anyone knows any other methods, please do let me know.

Possible solution 1: download and install Hadoop binary in your home directory, add the following codes to your bash_profile, remember to change version to your version:

export HADOOP_HOME=~/hadoop-2.8.0

Possible solution 2: add “native” to HADOOP_OPTS:

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

Possible solution 3: similar to solution 2, but add one more line to specify the “native” location:

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

This error is usually caused by JAVA_HOME is not set, so add the following codes to your bash_profile shoud do the trick, remember to change the spark version to the version you have:

export JAVA_HOME=/Library/Java/Home

Also Julius Wang shared another possible cause and fix of setting up SPARK_HOME that you could also try:

export SPARK_HOME=/<your spark installation location>/spark-1.6.0
export PYSPARK_SUBMIT_ARGS=pyspark-shell

4. Some other useful commands

  • If you want to uninstall any previous version of Java to make a clean installation of the Java 8, use the following code:
sudo rm -fr /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin 
sudo rm -fr /Library/PreferencePanes/JavaControlPanel.prefPane
sudo rm -fr ~/Library/Application\ Support/Java
  • If you want to uninstall Spark, use $ brew remove --force apache-spark

If you have any questions regarding the above steps, or if you encountered any other error, you can let me know and I will try to help.

Anyway, I’m just a newbie who’s only been studying Python, Spark, or machine learning for not very long time, but I’m more than willing to discuss these topics and learn from all of you. By the way, I had experience working as a technical support when I was in college, so at least I’m confident in working in cmd and I’m very good at looking for solutions from the Google search.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store