Installation of Apache Spark — PySpark on Linux — Ubuntu
— — — — — — — — — — — — — -
| Different Ways of Installation |
— — — — — — — — — — — — —
Getting Linux Machine Ready With Spark
Windows, Linux, Mac
If Linux you are good to go ahead
If Windows and Mac,
Virtual Machine — Linux + VirtualBox
Cloud Setup —
On Demand Pre installed Hadoop + Spark Based Instance
Buy Linux Virtual Machine and install Apache Spark
Amazon EC2
GCP
Databricks
Digital Ocean
Rackspace
Linode
Windows — Digital Ocean Linux Instance + Spark
If you are looking for installation on Windows machine check —
https://medium.com/@ankit.25587/installation-of-apache-spark-on-windows-4e4e4141f877
Installation Part — 1 Cloud Digital Ocean Setup
Create Digital Ocean Account + Credit Card
Create Droplet
Connect virtual instance through SSH — 22ssh root@159.65.177.136efabad96c4b89f30738ef81f0b
Installation Part — 2 Jupyter notebook + Python3
Check Python2 and Python3 installed.Install Jupyter notebookif not installed : trysudo apt install python3pip3 install jupytersudo apt install python3-pipjupyter notebook — ip=$ip — allow-root
Installation Part — 3 Install Scala, Java, Py4j, Spark
Why Scala :
Spark is Written in ScalaScala — sudo apt-get install scala
scala -versionWhy Java :
Spark Compiler Converts Scala code to JVM ByteCode
No need full JDK
Just JRE is SufficientJava — sudo apt-get install default-jre
java -versionWhy Py4j:
Py4j — Python to JAVA
pip3 install py4jSpark :wget http://redrockdigimark.com/apachemirror/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
Change as per nearest mirror
tar -xvf spark-2.2.1-bin-hadoop2.7.tgz
mv spark-2.2.1-bin-hadoop2.7.tgz spark
Installation Part — 4 Set Path and start Jupyter notebook
export SPARK_HOME=’/root/spark’ — Change required
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=”jupyter”
export PYSPARK_DRIVER_PYTHON_OPTS=”notebook”
export PYSPARK_PYTHON=python3
Add this path to .bashrc
execute . .bashrcVerify :
From Command Line :
python3
import pysparkFrom Jupyter notebook :
jupyter notebook — ip=$ip — allow-root (If you are running as root user)import pyspark
Checkout complete course on Apache spark with Python Pyspark at