Setting up Spark 1.6.1 on Ubuntu 16.04

Luck Charoenwatana

5 min readJul 3, 2016

There are briefly 5 key steps to setup Spark 1.6.1 on an Ubuntu machine.

Assumed that you have installed Ubuntu 16.04 already.
Download 3 files: Apache Spark binary, Java JDK, and Sbt.
Pre-installation including 3.1) configure ssh server, 3.2) configure password-less ssh login, 3.3) configure IPv6
Installation steps include: 4.1) extract files, 4.2) create shortcuts (optional but recommended), 4.3) configure .bashrc file, 4.4) configure spark configuration file, 4.5) close and re-open the terminal
Test run: 5.1) pyspark, 5.2) spark-shell, 5.3) SparkPi

1. Ubuntu installation

If you have not installed Ubuntu 16.04, you can find instructions here.

2. Downloads

There are only 3 files to download

Apache Spark (binary for hadoop 2.6)
JavaSE JDK (version 8u92)
Sbt (version 0.13.11)

From Spark’s home page, click “Donwload Spark”

In 2), Select the “binary, pre-built for Hadoop 2.6”
Then in 4), click the link of the spark-1.6.2-bin-hadoop2.6.tgz

Click the first link to download the file

Download Spark — Save to Downloads directory

Choose save file to the Downloads directory

3. Pre-Installation

3.1 Configure SSH server

This is to update the repository and then install the openssh server program.

sudo apt-get update
sudo apt-get install openssh-server

3.2 Configure passworld-less ssh login

The concept is simple: to generate a private and a public keys, then add the public key to the authorised list. You can read this for more information.

cd
ssh-keygen -t rsa -P ""
cat ./.ssh/id_rsa.pub >> ./.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys## THEN
sudo service ssh restart
-- OR --
sudo reboot

3.3 Configure IPv6

These commands basically disable the IPv6.

cd
sudo vi /etc/sysctl.conf## Then add these lines to the file ##net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

4. Installation

4.1 Extract downloaded files

Use “tar xzvf” command to untar and unzip the downloaded files.

cd
cd Downloads
tar xzvf jdk-8u92-linux-x64.tar.gz
tar xzvf spark-1.6.1-bin-hadoop2.6.tgz
tar xzvf sbt-0.13.11.tgz

4.2 create shortcuts (soft links)

This is recommended as it would make life easier when you to upgrade Spark in the future. When you install new Spark version, just install the new version, re-configure these soft links (to the appropriate version number changed), and leave Spark’s and .bashrc configurations untouched.

cd
ln -s ./Downloads/jdk1.8.0_92/ ./jdk
ln -s ./Downloads/spark-1.6.1-bin-hadoop2.6 ./spark
ln -s ./Downloads/sbt ./sbt

4.3 Configure .bashrc

Basically, this is to modify PATH variable.

cd
vi ./.bashrc## Then add these lines to the file ##export JAVA_HOME=/home/luck/jdk
export SBT_HOME=/home/luck/sbt
export SPARK_HOME=/home/luck/sparkexport PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

4.4 Configure Spark

Just one file at the moment, spark-en.sh file. Copy from the template file and add some texts.

cd
cd spark/conf/
cp spark-env.sh.template spark-env.sh
vi spark-env.sh## Then add these lines to the file ##JAVA_HOME=/home/luck/jdk
SPARK_MASTER_IP=10.1.20.241
SPARK_WORKER_MEMORY=4g

4.5 Exit the terminal and re-open the terminal

5. Test Run

5.1 Test Spark with Python shell (pyspark)

Interactive Python shell for Spark can be run using “pyspark” command. Then test it with simple sum and print commands. To exit the pyspark, type CTRL-d

cd
pyspark>>> a = 5
>>> b = 3
>>> a+b
8
>>> print(“Welcome to Spark”)
Welcome to Spark## type Ctrl-d to exit

5.2 Test Spark with Scala shell (spark-shell)

Interactive Scala shell for Spark can be run using “spark-shell” command. Then test it with the same simple commands. To exit the pyspark, type “exit”

cd
spark-shellscala> val a = 5
a: Int = 5
scala> val b = 3
b: Int = 3
scala> a+b
res0: Int = 8scala> print(“Welcome to Spark and Scala”)
Welcome to Spark and Scala
scala> exit

running the spark-shell (Scala on Spark)

5.3 Test Run SparkPi

Non-interactive Spark execution can be run in a number of ways. Here we try the run-example command to compute Pi value using Spark. You will see lots of output lines, look for the Pi value around the ending lines.

cd
run-example org.apache.spark.examples.SparkPi