Installing Apache Spark 2.3.0 on macOS High Sierra
Apache Spark 2.3.0 has been released on 28 February 2018. This tutorial guides you through its essential installation steps on macOS High Sierra.
Step 1: List of Downloads
As clearly mentioned in Spark’s documentation, in order to run Apache Spark 2.3.0 you need “Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11”. The download links below are for JDK 8u162, Scala 2.11.12, Sbt 0.13.17, and Python 3.6.4.
- spark-2.3.0-bin-hadoop2.7.tgz (us link, eu link)
- jdk-8u162-macosx-x64.dmg
- scala-2.11.12.tgz
- sbt-0.13.17.tgz
- python-3.6.4-macosx10.6.pkg. Although optional (as macOS has built-in Python), it is recommended to install your own Python.
Step 2: Installation Preparations
2.1 The HOME folder of this tutorial
- Home folder of this tutorial is
/Users/luckspark. This home directory can also be referred to as$HOMEor~. Therefore,/Users/luckspark/serverand$HOME/serverand~/serverare the same. - Your HOME folder is probably different. Therefore, it is import that you replace all
/Users/luckspark/serverto match your HOME directory on your machine. - For example, you might want to replace all
/Users/luckspark/serverto/Users/tim/server.
2.2 The installation folder of this tutorial
In this tutorial, Sbt, Scala, and Spark, will be installed at /Users/luckspark/server (i.e., $HOME/server or ~/server). You can create the server directory under your HOME using the following commands
cd
mkdir server- Note for beginners, the command
cdchanges the directory (from wherever it is) to HOME directory. The commands above, thus, change the directory back to HOME, then create a new directory named “server”.
2.3 Copy all downloaded files to $HOME/server folder
- For simplicity, copy all downloaded files from step 1 to the
$HOME/serverfolder. Yourserverfolder shall look like this.
Step 3: Extract the downloaded files
- Extract the .tgz files (sbt*.tgz, spark*.tgz, and scala*.tgz) by double click each file, which will launch the Archive Utility program and extract the files automatically.
- There will be 3 new folders, each of which corresponds to each .tgz files, as shown below.
Step 4: Install JDK
- Double click the
jdk-8u162-macosx-x64.dmgto launch JDK installation process. - Double click the
JDK 8 Update 162.pkgicon to install. The installation wizard screen will pop up.
Step 5: Install Python 3
- Double click the
python-3.6.4-macosx10.6.pkgfile to start Python 3 installation. Follow the wizard screens with default options.
Step 6: Setup shell environment by editing the ~/.bash_profile file
6.1 Summary of directory paths
Here are the directory paths of the programs that we have installed so far:
- JDK:
/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk - Python:
/Library/Frameworks/Python.framework/Versions/3.6 - Sbt:
/Users/luckspark/server/sbt - Scala:
/Users/luckspark/server/scala-2.11.12 - Spark:
/Users/luckspark/server/spark-2.3.0-bin-hadoop2.7
These paths will be used in step 6.2. Make sure to replace the /Users/luckspark/server with your HOME path. You do not need to modify the paths of JDK and Python.
6.2 Setup .bash_profile file
- Note for beginners, this file is a special file in which the name is deliberately start with a “dot”. Therefore, make sure that you type the file name correctly, which is
.bash_profile(with a “dot” in front). - Open the
.bash_profilefile, which is located at your HOME directory (i.e.,~/.bash_profile), using any text editor (e.g., TextEdit, nano, vi, or sublime). For example, open the Terminal app and use these commands to open the.bash_profilewith Mac’s TextEdit app.
cd
touch -c .bash_profile
open -a TextEdit .bash_profile- Note for beginners, the commands above 1) change the directory back to HOME directory, 2) If does not exist, create a file named
.bash_profile, if the file does already exist, this will not overwrite the file, and 3) open the.bash_profilewith TextEdit app.
- Copy these lines to the file.
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/
export SPARK_HOME=/Users/luckspark/server/spark-2.3.0-bin-hadoop2.7
export SBT_HOME=/Users/luckspark/server/sbt
export SCALA_HOME=/Users/luckspark/server/scala-2.11.12export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATHexport PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATHexport PYSPARK_PYTHON=python3
- The
.bash_profilefile shall look like this.
- Note that the last 4 lines are the path for Python and could have been automatically added from the Python installation process. If not, you can copy and paste these lines manually.
PATH=”/Library/Frameworks/Python.framework/Versions/3.6/bin:${PATH}”
export PATH- Note for beginners, the lines starting with “#” in the
.bash_profileare comment lines. - Save and close the file.
Step 7: Reload .bash_profile
Since the .bash_profile has been changed, we have to reload it. Options are
- Type
source ~/.bash_profile
OR
- Quit and reopen the Terminal program. Make sure you completely quit the Terminal using
menu → Quit Terminal (⌘Q), otherwise the environment variables declared above will not be loaded.
Step 8: Test the installation
- Open the Terminal app.
8.1 Test Java
- type
java -versionwhich shall return the screen below
8.2 Test PySpark
- PySpark is Spark’s Python interactive shell.
- At the Terminal, type
pyspark, you shall get the following screen showing Spark banner with version 2.3.0.
- Type
CTRL-Dorexit()to exit the pyspark shell.
8.3: Test spark-shell
- Spark-shell is interactive Spark shell for Scala.
- At the Terminal, type
spark-shell, you shall get the following screen output.
- Type
CTRL-Dto quit spark-shell.

