Installing Spark on Ubuntu 20 on Digital Ocean in 2020.

Data Ninja
Solving the Human Problem
2 min readSep 1, 2020
Photo by fabio on Unsplash

Here is a quick cheatsheet to get your Spark standalone cluster running on an Ubuntu server.

Install Hadoop:

# install the Java development kit (JDK)
sudo apt-get install default-jdk
# Download Hadoop from http://hadoop.apache.org/releases.html
# For me the following mirror worked:
wget https://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
# unzip the files
tar -xvf hadoop-3.8.0.tar.gz
# move the unziped folder to /usr/local:
sudo mv hadoop-3.8.0 /usr/local/hadoop
# find the location of JDK:
readlink -f /usr/bin/java | sed "s#bin/java##"
# > /usr/lib/jvm/java-11-openjdk-amd64/jre/
# Edit Hadoop config file and set JAVA_HOME
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
# find the line for JAHA_HOME and add the JDK path as follows:
# export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/jre/
# Test:
/usr/local/hadoop/bin/hadoop version

Install Spark:

# Download the latest version of Spark from here: https://spark.apache.org/downloads.html
# This mirror worked for me. Find the closest to you:
wget https://ftp.riken.jp/net/apache/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz
# unzip the file:
tar -xvf spark-3.0.0-bin-hadoop2.7.tgz
# move spark to /usr/local
sudo mv spark-3.0.0-bin-hadoop2.7 /usr/local/spark
# Start a stand alone Spark server with the following command:
/usr/local/spark/sbin/start-master.sh
# Stop Server with the following:
/usr/local/spark/sbin/stop-master.sh
# You can start a slave server with the following:
/usr/local/spark/sbin/start-slave.sh spark://MASTER_IP:7077
# To stop the slave run:
/usr/local/spark/sbin/stop-slave.sh

--

--

Data Ninja
Solving the Human Problem

Focusing on Machine Learning and AI. Solving problems for the humans.