java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset
--
As a Data Engineer I´m using WSL2 to run my Linux and dev station (python, scala,…), because I really like the Windows OS and also the Linux kernel and the OpenSource software and community. The thing is that there is an annoying problem with windows and spark when you try to run a spark job in windows (no matter if like me, you are using docker to run your spark local cluster).
If you want to run a spark job in windows that make use of the spark SQL pool, you need to set the HADOOP_HOME environment variable manually because if you don´t you will receive a crash like this:
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset
There is a very quick fix for this:
- Go and clone this repo: https://github.com/cdarlint/winutils in your Windows machine.
NOTE: I recommend you to clone the entire repo because doing that you will be able to use any version of winutils.exe
2. Set HADOOP_HOME environment variable to the root path of your version, depending on the scala version you are using
3. Add to the PATH environment, the %HADOOP_HOME%\bin
cmd> HADOOP_HOME=<your local hadoop-ver folder WITHOUT ‘bin’>
cmd> PATH=%PATH%;%HADOOP_HOME%\bin
NOTE: For more info please read this