Installing and Running Spark on MAC OS (Locally)

Julius Wang
Data Science Canvas
2 min readApr 18, 2016

It took me a while to get Spark running on MAC with the help of the instructions from John Ramey’s post. I’m using MAC OS version 10.11.2 with 8G RAM, spark 1.6.0, but I got stuck while I’d followed steps presented in that blog, the problems were

  • cannot successfully build Spark with
$sbt assembly

the errors were something like

[error] java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
[error] Use 'last' for the full log.
  • when the above problem was solved and tried to run the codes like
import osspark_home = os.environ.get(‘SPARK_HOME’, None)
text_file = sc.textFile(spark_home + “/README.md”)

it returned the error messages

Exception: Java gateway process exited before sending the driver its port number

So, I summarised those instructions to solve these issues as follows, some contents are redundant to those in John Ramey’s post (and I’d just copied those instructions here, I’m not the original author obviously), but at least you don’t have to switch between blogs to get instructed and get things done.

Download, Install and build Spark

  1. download here: I used spark 1.6.0
  2. Unzip it to somewhere: $~/Projects/spark-1.6.0/
  3. Go to the directory you unzipped spark : $cd ~/Projects/spark-1.6.0/
  4. Install Scala build tool: $brew install sbt
  5. Build (assemble) spark with following script (be conscious of parameter 2048 there, see the instructions below it), sit back and wait for while:
$ sbt/sbt assembly -mem 2048

This is the instruction differed from that in [blog 1], where parameter 2048 is used to specify the heap size you try to run sbt. I failed to find the source that someone said if you got 4G RAM on your computer, then specify it to be 1024; so, mine is 8G, I just double it to be 2048, and it works, otherwise with 256, 512, 1024, it failed to assemble spark and always returned the OutofMemory error.

Setting up environment variable

  • edit your ~/.bashrc or ~/.bash_profile with
$ vi ~/.bashrc
  • Add following content into ~/.bashrc
export SPARK_HOME=/Users/julius/Projects/spark-1.6.0
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_SUBMIT_ARGS=pyspark-shell

The last line (setting PYSPARK_SUBMIT_ARGS=pyspark-shell) is actually how I solved Java exception error mentioned above

  • Remember to force system apply the new variables
$ source ~/.bashrc
  • Just in case that your system still won’t apply the new changes, you need to reboot the system
  • One can check if the changes have already been added in system variables with following python scripts
# Just make sure that every variable you ‘exported’ should be present in the printed messagesimport os, sys
print os.environ.get('SPARK_HOME')
print os.environ.get('PATH')
print os.environ.get('PYSPARK_SUBMIT_ARGS')
print sys.path

Run a word count example from wordcount.py

Then you could probably see some INFO message from Spark and the word counting results.

--

--