Getting Started with PySpark and Jupyter
Sep 6, 2018 · 1 min read
Install Jupyter Notebook
pip install jupyter
Install pySpark
Before installing spark, you should have Java8 or higher version then download the latest version of spark which is a prebuilt package with Hadoop.
Extract and move it to /opt folder
$ tar -xzf spark-2.2.1-bin-hadoop2.7.tgz$ mv spark-2.2.1-bin-hadoop2.7 /opt/spark-2.2.1
Create a symbolic link
$ ln -s /opt/spark-2.2.1 /opt/spark
Configure your $PATH variables, add the following lines into your ~/.bashrc
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATHPySpark in Jupyter
Update PySpark environment variables, add the following lines into your ~/.bashrc file
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'Restart the terminal and run PySpark
$ pysparkNow you will be able to run PySpark in jupyter notebook !!!