Setting up Spark in Jupyter lab
One spark can ignite the world
Java Installation
First, check whether java is installed in your system using the below command.
java --version
javac --version
If you’ve not installed java yet, use the below command to install and check the version once again to verify the installation.
sudo apt install openjdk-11-jdk-headless
Now, we have to set the location for JAVA_HOME. To check whether it’s been done already, use:
echo $JAVA_HOME
If it does not return anything, then it means that the location is not set yet. To set the location,
- Get your java location
readlink -f $(which java)
This command return the location which looks something similar to:
/usr/lib/jvm/java-13-openjdk-amd64/bin/java
Remove ‘/bin/java’ from this location and save it somewhere.
2. Open gedit
gedit ~/.zshrc
3. Paste the below command in your gedit window and save the file
export JAVA_HOME=/usr/lib/jvm/java-13-openjdk-amd64
Note that we are using the location which we saved before.
4. Save the changes made and use the below command in your terminal
source ~/.zshrc
Java setup is done!
Scala Installation
We’ll first check whether Scala is already installed in our system using the version command.
scala -version
If it’s not installed yet, use the below command to install and check the version once again to verify the installation.
sudo apt-get install scala
Scala setup is done!
Spark Installation
- Use the wget command to get the required Spark archive version.
wget https://downloads.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
You’ll see a saved message in your terminal after it completes running successfully.
If this command doesn’t work, go to Apache Spark download page and get the latest version.
2. Now, we’ll extract it
tar xvf spark-3.2.1-bin-hadoop3.2.tgz
3. After extracting it, we’ll move it /opt/spark directory
sudo mv spark-3.2.1-bin-hadoop3.2 /opt/spark
4. Now we’ll configure the spark environment using the below commands:
gedit ~/.profile
Paste these lines in your gedit and save the file.
export SPARK_HOME=/opt/sparkexport PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbinexport PYSPARK_PYTHON=/usr/bin/python3
After saving, use the below command in your terminal
source ~/.profile
5. We’ll now open the spark shell to confirm the installation
spark-shell
Spark setup is done!
spylon-kernel setup
Use the below given commands to install Miniconda
cd /tmp
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
conda config --set auto_activate_base false
After installing Miniconda, we’ll now create a new environment to install our spylon kernel.
conda create -n py39spark -y python=3.9
We can now activate the new environment.
conda activate py39spark
We’ll now install jupyter lab, spylon and spylon-kernel using the following commands.
pip install jupyterlab
pip install spylon
pip install spylon-kernel
python -m spylon_kernel install --user
spylon kernel setup is done!
Hello world!
Now all you have to do is open the jupyter lab and select the spylon kernel.
jupyter lab
This command opens the jupyter lab.
Select the spylon-kernel under Notebook.
Now, to check whether everything works fine, we’ll print a simple Hello world statement.
print("Hello World!")
Run the cell and wait until the spark session gets initialized.
Your Spark setup is all done!!!!
Happy coding!