Setting up Spark in Jupyter lab

Ponshriharini
featurepreneur
Published in
3 min readFeb 26, 2022

One spark can ignite the world

Java Installation

First, check whether java is installed in your system using the below command.

java --version
javac --version

If you’ve not installed java yet, use the below command to install and check the version once again to verify the installation.

sudo apt install openjdk-11-jdk-headless

Now, we have to set the location for JAVA_HOME. To check whether it’s been done already, use:

echo $JAVA_HOME

If it does not return anything, then it means that the location is not set yet. To set the location,

  1. Get your java location
readlink -f $(which java)

This command return the location which looks something similar to:

/usr/lib/jvm/java-13-openjdk-amd64/bin/java

Remove ‘/bin/java’ from this location and save it somewhere.

2. Open gedit

gedit ~/.zshrc

3. Paste the below command in your gedit window and save the file

export JAVA_HOME=/usr/lib/jvm/java-13-openjdk-amd64

Note that we are using the location which we saved before.

4. Save the changes made and use the below command in your terminal

source ~/.zshrc

Java setup is done!

Scala Installation

We’ll first check whether Scala is already installed in our system using the version command.

scala -version

If it’s not installed yet, use the below command to install and check the version once again to verify the installation.

sudo apt-get install scala

Scala setup is done!

Spark Installation

  1. Use the wget command to get the required Spark archive version.
wget https://downloads.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz

You’ll see a saved message in your terminal after it completes running successfully.

If this command doesn’t work, go to Apache Spark download page and get the latest version.

2. Now, we’ll extract it

tar xvf spark-3.2.1-bin-hadoop3.2.tgz

3. After extracting it, we’ll move it /opt/spark directory

sudo mv spark-3.2.1-bin-hadoop3.2 /opt/spark

4. Now we’ll configure the spark environment using the below commands:

gedit ~/.profile

Paste these lines in your gedit and save the file.

export SPARK_HOME=/opt/sparkexport PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbinexport PYSPARK_PYTHON=/usr/bin/python3

After saving, use the below command in your terminal

source ~/.profile

5. We’ll now open the spark shell to confirm the installation

spark-shell

Spark setup is done!

spylon-kernel setup

Use the below given commands to install Miniconda

cd /tmp
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

conda config --set auto_activate_base false

After installing Miniconda, we’ll now create a new environment to install our spylon kernel.

conda create -n py39spark -y python=3.9

We can now activate the new environment.

conda activate py39spark

We’ll now install jupyter lab, spylon and spylon-kernel using the following commands.

pip install jupyterlab
pip install spylon
pip install spylon-kernel
python -m spylon_kernel install --user

spylon kernel setup is done!

Hello world!

Now all you have to do is open the jupyter lab and select the spylon kernel.

jupyter lab

This command opens the jupyter lab.

Select the spylon-kernel under Notebook.

Now, to check whether everything works fine, we’ll print a simple Hello world statement.

print("Hello World!")

Run the cell and wait until the spark session gets initialized.

Your Spark setup is all done!!!!

Happy coding!

--

--