Installing Apache Spark 2.3.0 on macOS High Sierra

Luck Charoenwatana

Follow

Published in

LuckSpark

5 min readMar 6, 2018

--

Apache Spark 2.3.0 has been released on 28 February 2018. This tutorial guides you through its essential installation steps on macOS High Sierra.

อ่านฉบับภาษาไทยได้ที่นี่ครับ

Step 1: List of Downloads

As clearly mentioned in Spark’s documentation, in order to run Apache Spark 2.3.0 you need “Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11”. The download links below are for JDK 8u162, Scala 2.11.12, Sbt 0.13.17, and Python 3.6.4.

spark-2.3.0-bin-hadoop2.7.tgz (us link, eu link)
jdk-8u162-macosx-x64.dmg
scala-2.11.12.tgz
sbt-0.13.17.tgz
python-3.6.4-macosx10.6.pkg. Although optional (as macOS has built-in Python), it is recommended to install your own Python.

Java JDK 8 download page, choose the macOS version to download.

Python download page. If you have old codes that specifically requires version 2.x, choose 2.7.x. If you are new to Python or Spark, choose 3.x (i.e., download version 3.6.4 here). You can always swap between 2.x or 3.x Python later.

Step 2: Installation Preparations

2.1 The HOME folder of this tutorial

Home folder of this tutorial is /Users/luckspark. This home directory can also be referred to as $HOME or ~. Therefore, /Users/luckspark/server and $HOME/server and ~/server are the same.
Your HOME folder is probably different. Therefore, it is import that you replace all /Users/luckspark/server to match your HOME directory on your machine.
For example, you might want to replace all /Users/luckspark/server to /Users/tim/server.

2.2 The installation folder of this tutorial

In this tutorial, Sbt, Scala, and Spark, will be installed at /Users/luckspark/server (i.e., $HOME/server or ~/server). You can create the server directory under your HOME using the following commands

cd
mkdir server

Note for beginners, the command cd changes the directory (from wherever it is) to HOME directory. The commands above, thus, change the directory back to HOME, then create a new directory named “server”.

2.3 Copy all downloaded files to `$HOME/server` folder

For simplicity, copy all downloaded files from step 1 to the $HOME/server folder. Your server folder shall look like this.

Copy all downloaded files to **~HOME/server** directory

Step 3: Extract the downloaded files

Extract the .tgz files (sbt*.tgz, spark*.tgz, and scala*.tgz) by double click each file, which will launch the Archive Utility program and extract the files automatically.

The Archive Utility program extracts the files automatically.

There will be 3 new folders, each of which corresponds to each .tgz files, as shown below.

$HOME/server after extracting all .tgz files.

Step 4: Install JDK

Double click the jdk-8u162-macosx-x64.dmg to launch JDK installation process.
Double click the JDK 8 Update 162.pkg icon to install. The installation wizard screen will pop up.

JDK installation. Double click the box icon to begin the installation.

Installation wizard screens. Just follow the default options.

Step 5: Install Python 3

Double click the python-3.6.4-macosx10.6.pkg file to start Python 3 installation. Follow the wizard screens with default options.

Python 3 installation wizard screen. Follow the default options.

Step 6: Setup shell environment by editing the ~/.bash_profile file

6.1 Summary of directory paths

Here are the directory paths of the programs that we have installed so far:

JDK: /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk
Python: /Library/Frameworks/Python.framework/Versions/3.6
Sbt: /Users/luckspark/server/sbt
Scala: /Users/luckspark/server/scala-2.11.12
Spark: /Users/luckspark/server/spark-2.3.0-bin-hadoop2.7

These paths will be used in step 6.2. Make sure to replace the /Users/luckspark/server with your HOME path. You do not need to modify the paths of JDK and Python.

6.2 Setup .bash_profile file

Note for beginners, this file is a special file in which the name is deliberately start with a “dot”. Therefore, make sure that you type the file name correctly, which is .bash_profile (with a “dot” in front).
Open the .bash_profile file, which is located at your HOME directory (i.e., ~/.bash_profile), using any text editor (e.g., TextEdit, nano, vi, or sublime). For example, open the Terminal app and use these commands to open the .bash_profile with Mac’s TextEdit app.

cd
touch -c .bash_profile
open -a TextEdit .bash_profile

Note for beginners, the commands above 1) change the directory back to HOME directory, 2) If does not exist, create a file named .bash_profile, if the file does already exist, this will not overwrite the file, and 3) open the .bash_profile with TextEdit app.

Open .bash_profile using TextEdit app. Your .bash_profile could be blank or contain different texts.

Copy these lines to the file.

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/
export SPARK_HOME=/Users/luckspark/server/spark-2.3.0-bin-hadoop2.7
export SBT_HOME=/Users/luckspark/server/sbt
export SCALA_HOME=/Users/luckspark/server/scala-2.11.12export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATHexport PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATHexport PYSPARK_PYTHON=python3

The .bash_profile file shall look like this.

~/.bash_profile with environment variables configured. Again, do not forget to replace all **/Users/luckspark/server** to match your installation PATH.

Note that the last 4 lines are the path for Python and could have been automatically added from the Python installation process. If not, you can copy and paste these lines manually.

PATH=”/Library/Frameworks/Python.framework/Versions/3.6/bin:${PATH}”
export PATH

Note for beginners, the lines starting with “#” in the .bash_profile are comment lines.
Save and close the file.

Step 7: Reload .bash_profile

Since the .bash_profile has been changed, we have to reload it. Options are

Type source ~/.bash_profile

OR

Quit and reopen the Terminal program. Make sure you completely quit the Terminal using menu → Quit Terminal (⌘Q), otherwise the environment variables declared above will not be loaded.

Step 8: Test the installation

Open the Terminal app.

8.1 Test Java

type java -version which shall return the screen below

8.2 Test PySpark

PySpark is Spark’s Python interactive shell.
At the Terminal, type pyspark , you shall get the following screen showing Spark banner with version 2.3.0.