Installing and Running Spark on MAC OS (Locally)

Published in

Data Science Canvas

2 min readApr 18, 2016

It took me a while to get Spark running on MAC with the help of the instructions from John Ramey’s post. I’m using MAC OS version 10.11.2 with 8G RAM, spark 1.6.0, but I got stuck while I’d followed steps presented in that blog, the problems were

cannot successfully build Spark with

$sbt assembly

the errors were something like

[error] java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
[error] Use 'last' for the full log.

when the above problem was solved and tried to run the codes like

import osspark_home = os.environ.get(‘SPARK_HOME’, None)
text_file = sc.textFile(spark_home + “/README.md”)

it returned the error messages

Exception: Java gateway process exited before sending the driver its port number

So, I summarised those instructions to solve these issues as follows, some contents are redundant to those in John Ramey’s post (and I’d just copied those instructions here, I’m not the original author obviously), but at least you don’t have to switch between blogs to get instructed and get things done.

Download, Install and build Spark

download here: I used spark 1.6.0
Unzip it to somewhere: $~/Projects/spark-1.6.0/
Go to the directory you unzipped spark : $cd ~/Projects/spark-1.6.0/
Install Scala build tool: $brew install sbt
Build (assemble) spark with following script (be conscious of parameter 2048 there, see the instructions below it), sit back and wait for while:

$ sbt/sbt assembly -mem 2048

This is the instruction differed from that in [blog 1], where parameter 2048 is used to specify the heap size you try to run sbt. I failed to find the source that someone said if you got 4G RAM on your computer, then specify it to be 1024; so, mine is 8G, I just double it to be 2048, and it works, otherwise with 256, 512, 1024, it failed to assemble spark and always returned the OutofMemory error.

Setting up environment variable

edit your ~/.bashrc or ~/.bash_profile with

$ vi ~/.bashrc

Add following content into ~/.bashrc

export SPARK_HOME=/Users/julius/Projects/spark-1.6.0
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_SUBMIT_ARGS=pyspark-shell

The last line (setting PYSPARK_SUBMIT_ARGS=pyspark-shell) is actually how I solved Java exception error mentioned above

Remember to force system apply the new variables

$ source ~/.bashrc

Just in case that your system still won’t apply the new changes, you need to reboot the system
One can check if the changes have already been added in system variables with following python scripts

# Just make sure that every variable you ‘exported’ should be present in the printed messagesimport os, sys
print os.environ.get('SPARK_HOME')
print os.environ.get('PATH')
print os.environ.get('PYSPARK_SUBMIT_ARGS')
print sys.path

Run a word count example from wordcount.py

Then you could probably see some INFO message from Spark and the word counting results.

Installing and Running Spark on MAC OS (Locally)

Download, Install and build Spark

Setting up environment variable

Run a word count example from wordcount.py

Written by Julius Wang