Setting up Spark 2.0.1 and Zeppelin 0.6.2 on macOS Sierra

Luck Charoenwatana
LuckSpark

--

At the time of writing, Zeppelin 0.6.2 is compatible with Spark 2.0.1.

Spark 2.1.0 is already released but is not yet compatible with Zeppelin 0.6.2. There are still errors when trying to use zeppelin with spark 2.1.0.

Step 1: Download all Required Softwares

  1. Java JDK
  2. Python 3.5.x
  3. SBT 0.13.x
  4. Spark 2.0.1
  5. Zeppelin 0.6.2

Note on the HOME directory of this tutorial

  • Home directory of this tutorial is /Users/myaccount. This directory will be referred to as HOME or ~ in this tutorial. (The ~ also means HOME in the shell (terminal))

In this tutorial
/Users/myaccount/Downloads = HOME/Downloads = ~/Downloads

  • It is import that you adjust your HOME directory according to your own HOME path for all of the following configurations.
  • For example, you might have to change from /Users/myaccount/Downloads to /Users/tim/Downloads.

Step 2: Extract the downloaded files

  • Assumed that all files are downloaded at ~/Downloads directory of your mac (i.e., /Users/myaccount/Downloads), as shown below.
The downloaded files saved at HOME/Downloads directory.
  • Extract the .tgz files (sbt*.tgz, spark*.tgz, zeppelin*.tgz) by double click the file, which will launch the Archive Utility program and extract the files automatically.
Files extraction by Archive Utility
sbt, spark, and zeppelin directories after the extraction.

Step 3: Install JDK

  • Double click the jdk-8u121-macosx-x64.dmg file launch the installation process.
  • Double click the JDK8 Update 121.pkg icon to install.
Double click the box icon to start the installation.
  • Follow the screen to finish the installation.
Follow the screen to finish the installation.

Step 4: Install Python 3

  • Double click the python-3.5.3-macosx10.6.pkg file to start Python 3 installation.
Follow the screen to finish the installation.

Step 5: Configure Zeppelin

  • start the “Terminal” program, which is located at /Applications/Utilities.
  • go to conf directory of the zeppelin at HOME/Downloads/zeppelin-0.6.2-bin-all/conf
  • in the conf directory, you shall see a file named zeppelin-env.sh.template
  • copy this file to a new name called zeppelin-env.sh

cd Downloads/zeppelin-0.6.2-bin-all/conf
cp zeppelin-env.sh.template zeppelin-env.sh

Go to the conf directory of the zeppelin. Then copy the zeppelin-env.sh.template to zeppelin-env.sh.
  • edit the zeppelin-env.sh file by adding this line to the very top of the file.

export SPARK_HOME=/Users/myaccount/Downloads/spark-2.0.1-bin-hadoop2.7

Declare the SPARK_HOME environment for zeppelin
  • Save and close the zeppelin-env.sh file.

Step 6: Setup shell environment by editing the ~/.bash_profile file

  • Using the terminal, change directory back to HOME
  • Open the ~/.bash_profile file using any text editor (e.g., nano, vi, or sublime)
  • Add these lines to the file

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/
export SPARK_HOME=/Users/myaccount/Downloads/spark-2.0.1-bin-hadoop2.7
export SBT_HOME=/Users/myaccount/Downloads/sbt-launcher-packaging-0.13.13
export ZEPPELIN_HOME=/Users/myaccount/Downloads/zeppelin-0.6.2-bin-all

export PYSPARK_PYTHON=python3

export PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SBT_HOME/bin:$ZEPPELIN_HOME/bin:$PATH

PATH=”/Library/Frameworks/Python.framework/Versions/3.5/bin:${PATH}”
export PATH

  • Your ~/.bash_profile shall look like this.
  • Save and close the file.
  • Quit the Terminal program. Make sure you completely quit the terminal otherwise the environment variables declared above will not be loaded.

Step 7: Test Spark

  • Open the Terminal.

Step 7.1: Test pyspark

  • Launch pyspark (Python of Spark)
  • Note 1: with the PATH setting in step 6, you shall be able to call pyspark from anywhere in the system.
  • Note 2: notice the first output line, you should get the Python 3.5.3 (not the 2.x.x).
pyspark interactive shell shall be started with Python 3.5.3 engine.
  • type ctrl-d to exit

Step 7.2: Test spark-shell

  • On the terminal, launch spark-shell (Scala of Spark)
  • Note: you shall be able to invoke spark-shell from anywhere too.
Scala-based interactive spark shell can be invoked using spark-shell command.

Step 8: Test Zeppelin

  • In the Terminal, start the zeppelin daemon using this command

zeppelin-daemon.sh start

  • Open web browser (e.g., Safari)
  • Go to localhost:8080
  • Zeppelin home page shall appear as shown below.
  • to stop the zeppelin, simply use zeppelin-daemon.sh stop command.

That’s it. Enjoy your Spark and Zeppelin.

--

--