Installing Spark on Ubuntu 20 on Digital Ocean in 2020.

Data Ninja

Published in

Solving the Human Problem

2 min readSep 1, 2020

Here is a quick cheatsheet to get your Spark standalone cluster running on an Ubuntu server.

Install Hadoop:

# install the Java development kit (JDK)
sudo apt-get install default-jdk# Download Hadoop from http://hadoop.apache.org/releases.html
# For me the following mirror worked:
 wget https://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz# unzip the files
tar -xvf hadoop-3.8.0.tar.gz# move the unziped folder to /usr/local:
sudo mv hadoop-3.8.0 /usr/local/hadoop# find the location of JDK:
readlink -f /usr/bin/java | sed "s#bin/java##"
# > /usr/lib/jvm/java-11-openjdk-amd64/jre/# Edit Hadoop config file and set JAVA_HOME
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh# find the line for JAHA_HOME and add the JDK path as follows:
# export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/jre/# Test:
/usr/local/hadoop/bin/hadoop version

Install Spark:

# Download the latest version of Spark from here: https://spark.apache.org/downloads.html
# This mirror worked for me. Find the closest to you:
wget https://ftp.riken.jp/net/apache/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz# unzip the file:
tar -xvf spark-3.0.0-bin-hadoop2.7.tgz# move spark to /usr/local
sudo mv spark-3.0.0-bin-hadoop2.7 /usr/local/spark# Start a stand alone Spark server with the following command:
/usr/local/spark/sbin/start-master.sh# Stop Server with the following:
/usr/local/spark/sbin/stop-master.sh# You can start a slave server with the following:
/usr/local/spark/sbin/start-slave.sh spark://MASTER_IP:7077# To stop the slave run:
/usr/local/spark/sbin/stop-slave.sh

Modified from: https://datawookie.netlify.app/blog/2017/07/installing-spark-on-ubuntu/

Installing Spark on Ubuntu 20 on Digital Ocean in 2020.

Install Hadoop:

Install Spark:

Written by Data Ninja