Pyspark in Jupyter installing on EdgeNode
Before feeding your data set to ML model, we often need to explore/per-process the data-set - to the format we needed. What else better than Spark to perform ETL and exploratory work! Yeah, but every time logging into spark console and not able to run ant analytical graphs is not a festive. So, I ended up integrating jupyter notebook in to my big data ecosystem. It turns out that, I am just 3 mins away from running my spark code from jupyter notebook.
Assuming
Spark (i have 2.2) cluster is up and running, conda is installed and hadoop(2.7) — are in the $PATH
On Edge Node
Confirm spark path SPARK_HOME:

Create Conda env
conda create -n pyspark3 python=3
source activate pyspark3Install jupyter
pip install jupyterConfigure jupyter notebook
jupyter notebook --generate-configRemote login
This will generate jupyter notebook config file under: /home/$USER/.jupyter/jupyter_notebook_config.py.
To make jupyter notebook work remotely update the below attributes
c.NotebookApp.ip = ‘*’
c.NotebookApp.port= ‘$PORT_U_LIKE’


Secure your notebook
Run below command to setup the password for your jupyter server
jupyter notebook passwordthis will generate the hash password in /home/$USER/.jupyter/jupyter_notebook_config.json. Copy the password and update the /home/$USER/.jupyter/jupyter_notebook_config.py file with below attribute

Toree
I found toree pyspark kernel to be extremely unstable with spark 2.2, so decided not to pursue. As alternatively went with findspark
Start jupyter notebook
Run notebook in background. Create a simple script called ‘start-notebook.sh’ and include below:
#!/bin/bash
exec jupyter notebook --no-browser &> /dev/null &Then add it to the path, by copying start-notebook.sh to /usr/local/bin and then run start-notebook.sh
You should be jupyter notebook like below:

Run spark job
Create a new notebook, import findspark and necessary packages

Create spark session

SPARK — HDFS

SPARK — HIVE

That’s its, now you have visual data exploratory through spark-jupyter notebook. Happy exploration!