Set up a local Spark cluster step by step in 10 minutes

Set up a local Spark cluster with one master node and one worker node in Ubuntu from scratch completely, and for free.

Andrew Zhu (Shudong Zhu)
CodeX

--

Tortle with 4 legs by Charles Zhu, my 6 yo son

This is an action list to install the open-sourced Spark master(or driver) and worker in local Ubuntu completely for free. (in contrast to Databricks for $$$)

The following setup runs in a home intranet. On one Linux(Ubuntu) physical machine(Jetson Nano) and one WSL2(Ubuntu) inside of Windows 10.

Step 1. Prepare environment

Make sure you have Java installed

sudo apt install openjdk-8-jdk

Check if you get Java installed

java -version

If you are going to use PySpark, go get Python installed

sudo apt install python3

Check if you get Python installed

python3 --version

Step 2. Download and install Spark in the Driver machine

From the Spark download page, select your version, I select the newest. (in any directory)

curl -O

--

--