Installation of Apache Spark on Windows

Ankit Mistry
Sep 6, 2018 · 2 min read

If you are looking for installation on Linux machine check — https://medium.com/@ankit.25587/installation-of-apache-saprk-on-linux-ubuntu-7d6fffab5b27

Getting Windows Machine Ready With Spark

Installation Part — 1 Jupyter notebook + Python3

Check Python2 and Python3 installed.Install python 3Either : anaconda or standalone python installer.Start notebook :jupyter notebook

Installation Part — 2 Install Java, Py4j, Spark

Why Java : Spark Compiler Converts Scala code to JVM ByteCodeDownload : http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.htmlWhy Py4j:Py4j — Python to JAVApip install py4jSpark :wget http://redrockdigimark.com/apachemirror/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz Change as per nearest mirror

Extract it to C:\spark

Installation Part — 3 Set Winutils

Download : https://raw.githubusercontent.com/steveloughran/winutils/master/hadoop-2.6.0/bin/winutils.exe

winutils.exe

And put it in C:\winutils\bin (Create folder if not exist)

Installation Part — 4 Set Path and start Jupyter notebook

Set below path in USER variable for PC :SPARK_HOME C:\sparkPYSPARK_DRIVER_PYTHON_OPTS notebook
PYSPARK_DRIVER_PYTHON ipython
PATH %SPARK_HOME%JAVA_HOME C:\Program Files\Java\jdk1.8.0_161
HADOOP_HOME C:\winutils

Verify

From Command Line :

pythonimport findsparkfindspark.init()import pyspark

From Jupyter notebook :

jupyter notebookimport findsparkfindspark.init()import pyspark

Check full course on Apache Spark with Python pyspark at

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade