java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset

Enrique Catala
1 min readMay 28, 2022

--

As a Data Engineer I´m using WSL2 to run my Linux and dev station (python, scala,…), because I really like the Windows OS and also the Linux kernel and the OpenSource software and community. The thing is that there is an annoying problem with windows and spark when you try to run a spark job in windows (no matter if like me, you are using docker to run your spark local cluster).

If you want to run a spark job in windows that make use of the spark SQL pool, you need to set the HADOOP_HOME environment variable manually because if you don´t you will receive a crash like this:

java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset

There is a very quick fix for this:

  1. Go and clone this repo: https://github.com/cdarlint/winutils in your Windows machine.

NOTE: I recommend you to clone the entire repo because doing that you will be able to use any version of winutils.exe

2. Set HADOOP_HOME environment variable to the root path of your version, depending on the scala version you are using

3. Add to the PATH environment, the %HADOOP_HOME%\bin

cmd> HADOOP_HOME=<your local hadoop-ver folder WITHOUT ‘bin’>
cmd> PATH=%PATH%;%HADOOP_HOME%\bin

NOTE: For more info please read this

--

--

Enrique Catala

Computer Engineer, Microsoft Data Platform MVP, Certified Kubernetes Administrator, Azure Data Scientist, Azure Data Engineer