Installing Apache Spark 2.3.0 on macOS High Sierra
Apache Spark 2.3.0 has been released on 28 February 2018. This tutorial guides you through its essential installation steps on macOS High Sierra.
Step 1: List of Downloads
As clearly mentioned in Spark’s documentation, in order to run Apache Spark 2.3.0 you need “Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11”. The download links below are for JDK 8u162, Scala 2.11.12, Sbt 0.13.17, and Python 3.6.4.
- spark-2.3.0-bin-hadoop2.7.tgz (us link, eu link)
- jdk-8u162-macosx-x64.dmg
- scala-2.11.12.tgz
- sbt-0.13.17.tgz
- python-3.6.4-macosx10.6.pkg. Although optional (as macOS has built-in Python), it is recommended to install your own Python.
Step 2: Installation Preparations
2.1 The HOME folder of this tutorial
- Home folder of this tutorial is
/Users/luckspark
. This home directory can also be referred to as$HOME
or~
. Therefore,/Users/luckspark/server
and$HOME/server
and~/server
are the same. - Your HOME folder is probably different. Therefore, it is import that you replace all
/Users/luckspark/server
to match your HOME directory on your machine. - For example, you might want to replace all
/Users/luckspark/server
to/Users/tim/server
.
2.2 The installation folder of this tutorial
In this tutorial, Sbt, Scala, and Spark, will be installed at /Users/luckspark/server
(i.e., $HOME/server
or ~/server
). You can create the server
directory under your HOME using the following commands
cd
mkdir server
- Note for beginners, the command
cd
changes the directory (from wherever it is) to HOME directory. The commands above, thus, change the directory back to HOME, then create a new directory named “server”.
2.3 Copy all downloaded files to $HOME/server
folder
- For simplicity, copy all downloaded files from step 1 to the
$HOME/server
folder. Yourserver
folder shall look like this.
Step 3: Extract the downloaded files
- Extract the .tgz files (sbt*.tgz, spark*.tgz, and scala*.tgz) by double click each file, which will launch the Archive Utility program and extract the files automatically.
- There will be 3 new folders, each of which corresponds to each .tgz files, as shown below.
Step 4: Install JDK
- Double click the
jdk-8u162-macosx-x64.dmg
to launch JDK installation process. - Double click the
JDK 8 Update 162.pkg
icon to install. The installation wizard screen will pop up.
Step 5: Install Python 3
- Double click the
python-3.6.4-macosx10.6.pkg
file to start Python 3 installation. Follow the wizard screens with default options.
Step 6: Setup shell environment by editing the ~/.bash_profile file
6.1 Summary of directory paths
Here are the directory paths of the programs that we have installed so far:
- JDK:
/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk
- Python:
/Library/Frameworks/Python.framework/Versions/3.6
- Sbt:
/Users/luckspark/server/sbt
- Scala:
/Users/luckspark/server/scala-2.11.12
- Spark:
/Users/luckspark/server/spark-2.3.0-bin-hadoop2.7
These paths will be used in step 6.2. Make sure to replace the /Users/luckspark/server
with your HOME path. You do not need to modify the paths of JDK and Python.
6.2 Setup .bash_profile file
- Note for beginners, this file is a special file in which the name is deliberately start with a “dot”. Therefore, make sure that you type the file name correctly, which is
.bash_profile
(with a “dot” in front). - Open the
.bash_profile
file, which is located at your HOME directory (i.e.,~/.bash_profile
), using any text editor (e.g., TextEdit, nano, vi, or sublime). For example, open the Terminal app and use these commands to open the.bash_profile
with Mac’s TextEdit app.
cd
touch -c .bash_profile
open -a TextEdit .bash_profile
- Note for beginners, the commands above 1) change the directory back to HOME directory, 2) If does not exist, create a file named
.bash_profile
, if the file does already exist, this will not overwrite the file, and 3) open the.bash_profile
with TextEdit app.
- Copy these lines to the file.
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/
export SPARK_HOME=/Users/luckspark/server/spark-2.3.0-bin-hadoop2.7
export SBT_HOME=/Users/luckspark/server/sbt
export SCALA_HOME=/Users/luckspark/server/scala-2.11.12export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATHexport PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATHexport PYSPARK_PYTHON=python3
- The
.bash_profile
file shall look like this.
- Note that the last 4 lines are the path for Python and could have been automatically added from the Python installation process. If not, you can copy and paste these lines manually.
PATH=”/Library/Frameworks/Python.framework/Versions/3.6/bin:${PATH}”
export PATH
- Note for beginners, the lines starting with “#” in the
.bash_profile
are comment lines. - Save and close the file.
Step 7: Reload .bash_profile
Since the .bash_profile has been changed, we have to reload it. Options are
- Type
source ~/.bash_profile
OR
- Quit and reopen the Terminal program. Make sure you completely quit the Terminal using
menu → Quit Terminal (⌘Q)
, otherwise the environment variables declared above will not be loaded.
Step 8: Test the installation
- Open the Terminal app.
8.1 Test Java
- type
java -version
which shall return the screen below
8.2 Test PySpark
- PySpark is Spark’s Python interactive shell.
- At the Terminal, type
pyspark
, you shall get the following screen showing Spark banner with version 2.3.0.
- Type
CTRL-D
orexit()
to exit the pyspark shell.
8.3: Test spark-shell
- Spark-shell is interactive Spark shell for Scala.
- At the Terminal, type
spark-shell
, you shall get the following screen output.
- Type
CTRL-D
to quit spark-shell.