Setting up Spark 1.6.1 on Ubuntu 16.04

There are briefly 5 key steps to setup Spark 1.6.1 on an Ubuntu machine.

  1. Assumed that you have installed Ubuntu 16.04 already.
  2. Download 3 files: Apache Spark binary, Java JDK, and Sbt.
  3. Pre-installation including 3.1) configure ssh server, 3.2) configure password-less ssh login, 3.3) configure IPv6
  4. Installation steps include: 4.1) extract files, 4.2) create shortcuts (optional but recommended), 4.3) configure .bashrc file, 4.4) configure spark configuration file, 4.5) close and re-open the terminal
  5. Test run: 5.1) pyspark, 5.2) spark-shell, 5.3) SparkPi

1. Ubuntu installation

If you have not installed Ubuntu 16.04, you can find instructions here.

2. Downloads

There are only 3 files to download

Apache Spark (binary for hadoop 2.6)
JavaSE JDK (version 8u92)
Sbt (version 0.13.11)
  • From Spark’s home page, click “Donwload Spark”
  • In 2), Select the “binary, pre-built for Hadoop 2.6”
  • Then in 4), click the link of the spark-1.6.2-bin-hadoop2.6.tgz
  • Click the first link to download the file
Download Spark — Save to Downloads directory
  • Choose save file to the Downloads directory

3. Pre-Installation

3.1 Configure SSH server

This is to update the repository and then install the openssh server program.

sudo apt-get update
sudo apt-get install openssh-server

3.2 Configure passworld-less ssh login

The concept is simple: to generate a private and a public keys, then add the public key to the authorised list. You can read this for more information.

cd
ssh-keygen -t rsa -P ""
cat ./.ssh/id_rsa.pub >> ./.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
## THEN
sudo service ssh restart
-- OR --
sudo reboot
The output of the ssh-keygen command

3.3 Configure IPv6

These commands basically disable the IPv6.

cd
sudo vi /etc/sysctl.conf
## Then add these lines to the file ##
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

4. Installation

4.1 Extract downloaded files

Use “tar xzvf” command to untar and unzip the downloaded files.

cd
cd Downloads
tar xzvf jdk-8u92-linux-x64.tar.gz
tar xzvf spark-1.6.1-bin-hadoop2.6.tgz
tar xzvf sbt-0.13.11.tgz
extract java jdk
extract spark

4.2 create shortcuts (soft links)

This is recommended as it would make life easier when you to upgrade Spark in the future. When you install new Spark version, just install the new version, re-configure these soft links (to the appropriate version number changed), and leave Spark’s and .bashrc configurations untouched.

cd
ln -s ./Downloads/jdk1.8.0_92/ ./jdk
ln -s ./Downloads/spark-1.6.1-bin-hadoop2.6 ./spark
ln -s ./Downloads/sbt ./sbt
create soft links

4.3 Configure .bashrc

Basically, this is to modify PATH variable.

cd
vi ./.bashrc
## Then add these lines to the file ##
export JAVA_HOME=/home/luck/jdk
export SBT_HOME=/home/luck/sbt
export SPARK_HOME=/home/luck/spark
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
configure ./.bashrc

4.4 Configure Spark

Just one file at the moment, spark-en.sh file. Copy from the template file and add some texts.

cd
cd spark/conf/
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
## Then add these lines to the file ##
JAVA_HOME=/home/luck/jdk
SPARK_MASTER_IP=10.1.20.241
SPARK_WORKER_MEMORY=4g
configure Spark
spark-env.sh modifications

4.5 Exit the terminal and re-open the terminal


5. Test Run

5.1 Test Spark with Python shell (pyspark)

Interactive Python shell for Spark can be run using “pyspark” command. Then test it with simple sum and print commands. To exit the pyspark, type CTRL-d

cd
pyspark
>>> a = 5
>>> b = 3
>>> a+b
8
>>> print(“Welcome to Spark”)
Welcome to Spark
## type Ctrl-d to exit
running pyspark
pyspark finish running
write some python on spark

5.2 Test Spark with Scala shell (spark-shell)

Interactive Scala shell for Spark can be run using “spark-shell” command. Then test it with the same simple commands. To exit the pyspark, type “exit

cd
spark-shell
scala> val a = 5
a: Int = 5
scala> val b = 3
b: Int = 3
scala> a+b
res0: Int = 8
scala> print(“Welcome to Spark and Scala”)
Welcome to Spark and Scala
scala> exit
running the spark-shell (Scala on Spark)
write some Scala code

5.3 Test Run SparkPi

Non-interactive Spark execution can be run in a number of ways. Here we try the run-example command to compute Pi value using Spark. You will see lots of output lines, look for the Pi value around the ending lines.

cd
run-example org.apache.spark.examples.SparkPi
run SparkPi