Installing Apache Spark 2.3.0 on macOS High Sierra

Luck Charoenwatana
LuckSpark
5 min readMar 6, 2018

--

Apache Spark 2.3.0 has been released on 28 February 2018. This tutorial guides you through its essential installation steps on macOS High Sierra.

อ่านฉบับภาษาไทยได้ที่นี่ครับ

Step 1: List of Downloads

As clearly mentioned in Spark’s documentation, in order to run Apache Spark 2.3.0 you need “Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11”. The download links below are for JDK 8u162, Scala 2.11.12, Sbt 0.13.17, and Python 3.6.4.

Java JDK 8 download page, choose the macOS version to download.
Python download page. If you have old codes that specifically requires version 2.x, choose 2.7.x. If you are new to Python or Spark, choose 3.x (i.e., download version 3.6.4 here). You can always swap between 2.x or 3.x Python later.

Step 2: Installation Preparations

2.1 The HOME folder of this tutorial

  • Home folder of this tutorial is /Users/luckspark. This home directory can also be referred to as $HOME or ~. Therefore, /Users/luckspark/server and $HOME/server and ~/server are the same.
  • Your HOME folder is probably different. Therefore, it is import that you replace all /Users/luckspark/server to match your HOME directory on your machine.
  • For example, you might want to replace all /Users/luckspark/server to /Users/tim/server.

2.2 The installation folder of this tutorial

In this tutorial, Sbt, Scala, and Spark, will be installed at /Users/luckspark/server (i.e., $HOME/server or ~/server). You can create the server directory under your HOME using the following commands

cd
mkdir server
  • Note for beginners, the command cd changes the directory (from wherever it is) to HOME directory. The commands above, thus, change the directory back to HOME, then create a new directory named “server”.

2.3 Copy all downloaded files to $HOME/server folder

  • For simplicity, copy all downloaded files from step 1 to the $HOME/server folder. Your server folder shall look like this.
Copy all downloaded files to ~HOME/server directory

Step 3: Extract the downloaded files

  • Extract the .tgz files (sbt*.tgz, spark*.tgz, and scala*.tgz) by double click each file, which will launch the Archive Utility program and extract the files automatically.
The Archive Utility program extracts the files automatically.
  • There will be 3 new folders, each of which corresponds to each .tgz files, as shown below.
$HOME/server after extracting all .tgz files.

Step 4: Install JDK

  • Double click the jdk-8u162-macosx-x64.dmg to launch JDK installation process.
  • Double click the JDK 8 Update 162.pkg icon to install. The installation wizard screen will pop up.
JDK installation. Double click the box icon to begin the installation.
Installation wizard screens. Just follow the default options.

Step 5: Install Python 3

  • Double click the python-3.6.4-macosx10.6.pkg file to start Python 3 installation. Follow the wizard screens with default options.
Python 3 installation wizard screen. Follow the default options.

Step 6: Setup shell environment by editing the ~/.bash_profile file

6.1 Summary of directory paths

Here are the directory paths of the programs that we have installed so far:

  • JDK: /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk
  • Python: /Library/Frameworks/Python.framework/Versions/3.6
  • Sbt: /Users/luckspark/server/sbt
  • Scala: /Users/luckspark/server/scala-2.11.12
  • Spark: /Users/luckspark/server/spark-2.3.0-bin-hadoop2.7

These paths will be used in step 6.2. Make sure to replace the /Users/luckspark/server with your HOME path. You do not need to modify the paths of JDK and Python.

6.2 Setup .bash_profile file

  • Note for beginners, this file is a special file in which the name is deliberately start with a “dot”. Therefore, make sure that you type the file name correctly, which is .bash_profile (with a “dot” in front).
  • Open the .bash_profile file, which is located at your HOME directory (i.e., ~/.bash_profile), using any text editor (e.g., TextEdit, nano, vi, or sublime). For example, open the Terminal app and use these commands to open the .bash_profile with Mac’s TextEdit app.
cd
touch -c .bash_profile
open -a TextEdit .bash_profile
  • Note for beginners, the commands above 1) change the directory back to HOME directory, 2) If does not exist, create a file named .bash_profile, if the file does already exist, this will not overwrite the file, and 3) open the .bash_profile with TextEdit app.
Open .bash_profile using TextEdit app. Your .bash_profile could be blank or contain different texts.
  • Copy these lines to the file.
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/
export SPARK_HOME=/Users/luckspark/server/spark-2.3.0-bin-hadoop2.7
export SBT_HOME=/Users/luckspark/server/sbt
export SCALA_HOME=/Users/luckspark/server/scala-2.11.12
export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATHexport PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATHexport PYSPARK_PYTHON=python3
  • The .bash_profile file shall look like this.
~/.bash_profile with environment variables configured. Again, do not forget to replace all /Users/luckspark/server to match your installation PATH.
  • Note that the last 4 lines are the path for Python and could have been automatically added from the Python installation process. If not, you can copy and paste these lines manually.
PATH=”/Library/Frameworks/Python.framework/Versions/3.6/bin:${PATH}”
export PATH
  • Note for beginners, the lines starting with “#” in the .bash_profile are comment lines.
  • Save and close the file.

Step 7: Reload .bash_profile

Since the .bash_profile has been changed, we have to reload it. Options are

  • Type source ~/.bash_profile

OR

  • Quit and reopen the Terminal program. Make sure you completely quit the Terminal using menu → Quit Terminal (⌘Q), otherwise the environment variables declared above will not be loaded.

Step 8: Test the installation

  • Open the Terminal app.

8.1 Test Java

  • type java -version which shall return the screen below
Output of java -version.

8.2 Test PySpark

  • PySpark is Spark’s Python interactive shell.
  • At the Terminal, type pyspark , you shall get the following screen showing Spark banner with version 2.3.0.
Output screen of pyspark.
  • Type CTRL-D or exit() to exit the pyspark shell.

8.3: Test spark-shell

  • Spark-shell is interactive Spark shell for Scala.
  • At the Terminal, type spark-shell , you shall get the following screen output.
Output screen of spark-shell.
  • Type CTRL-D to quit spark-shell.

That’s it. Enjoy your Spark.

--

--