Installation of Apache Spark on Windows
Sep 6, 2018 · 2 min read
If you are looking for installation on Linux machine check — https://medium.com/@ankit.25587/installation-of-apache-saprk-on-linux-ubuntu-7d6fffab5b27
Getting Windows Machine Ready With Spark
Installation Part — 1 Jupyter notebook + Python3
Check Python2 and Python3 installed.Install python 3Either : anaconda or standalone python installer.Start notebook :jupyter notebook
Installation Part — 2 Install Java, Py4j, Spark
Why Java : Spark Compiler Converts Scala code to JVM ByteCodeDownload : http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.htmlWhy Py4j:Py4j — Python to JAVApip install py4jSpark :wget http://redrockdigimark.com/apachemirror/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz Change as per nearest mirror
Extract it to C:\spark
Installation Part — 3 Set Winutils
Download : https://raw.githubusercontent.com/steveloughran/winutils/master/hadoop-2.6.0/bin/winutils.exewinutils.exe
And put it in C:\winutils\bin (Create folder if not exist)
Installation Part — 4 Set Path and start Jupyter notebook
Set below path in USER variable for PC :SPARK_HOME C:\sparkPYSPARK_DRIVER_PYTHON_OPTS notebook
PYSPARK_DRIVER_PYTHON ipythonPATH %SPARK_HOME%JAVA_HOME C:\Program Files\Java\jdk1.8.0_161
HADOOP_HOME C:\winutils
Verify
From Command Line :
pythonimport findsparkfindspark.init()import pyspark
From Jupyter notebook :
jupyter notebookimport findsparkfindspark.init()import pyspark
Check full course on Apache Spark with Python pyspark at