Windows Subsystem of Linux, Airflow, and Net.exe To Connect to Shared Drives from Airflow

I kind of like it, or maybe I have Stockholm syndrome

Andrew Troiano
The Startup
4 min readFeb 4, 2020

--

If you work in a large organization with a ton of developer resources, this article might not be for you. If you are like me and wear many hats and have the flexibility to use whatever tools you want, then keep reading.

The goal is to give you an overview of how I’m using WSL, Airflow, and Net.exe as well as provide links to resources that will be helpful for the installation and configuration of WSL and Airflow.

I am by no means an expert. If you have feedback or know of a different/better way to do some of this stuff, let me know in the comments!

When a Linux machine isn’t in the cards

Due to things outside my control, I can’t get a sweet, clean, standalone Linux machine. What I was able to get: Windows Server 2019. This supports WSL.

WSL allows you to install a Linux distribution within a Windows OS. For more information, check out this article: https://docs.microsoft.com/en-us/windows/wsl/faq

The Good of WSL

  • You can mount local FS folders to the Linux environment. The local FS mount is useful because it allows you to use code editors in a Windows IDE to edit files, like DAGS, that is accessible in Linux
  • You can use net.exe commands in Airflow bash commands, which provides authentication to Windows shared folders. (Example of how to do this is below)
  • It’s a fully running Linux environment
  • For commands run within the terminal by your user, like accessing a network drive, WSL will pass through the credentials from the Windows operating system.

The Bad of WSL

  • WSL only runs when a user is logged into the machine
  • No systemctl (I’ll layout a workaround below)
  • It’s running within Windows, so performance isn’t as great as a standalone environment (this is my assumption, I don’t have a way to benchmark).
  • I’m sure there are others I haven’t run into yet

The Hacks of WSL

  • The lack of systemctl means, you need to have a bunch of scripts that run as soon as the user logs in to make sure your services start. Below is an example of what I’ve set up for Airflow and Apache Superset.
Not the prettiest solution
  1. For the Run Airflow Scheduler, I am running something along the lines of : C:\Windows\System32\bash.exe -c ‘~/.local/bin/start_airflow_scheduler.sh’.
  2. The start_airflow_scheduler.sh looks like the image below.
Setting my airflow home variable, activating the python env and starting it

Net.Exe in Airflow Dag To Authenticate on Network Drive

Look at this sweet bash command below. The image is an example of how I have a bash command in a DAG to authenticate the airflow worker to the domain and run an R script that accesses a network file. This is really powerful because it can allow you to help automate mundane excel work for your co-workers.

  1. source /pwd.sh is an example of a file that contains your password. In this scenario, I’m not showing how to use encrypted passwords
  2. Using net.exe to stop and start the workstation will force the WSL to disconnect and reconnect all open connections. The commands will close open connections and make sure you authenticate smoothly.
  3. The net.exe use outlines how to connect to the shared folder. Assume $password is the name of the variable that is set in the source /pwd.sh file
  4. umount and mount will make sure the mounted share drive is active. I noticed that sometimes the mounted folder would persist through DAG runs
  5. /usr/bin/Rscript is how we call R in Airflow. This method is based on James Long method in this post.

Airflow in WSL Set Up

This is the best guide on getting it set up: https://www.astronomer.io/guides/airflow-wsl/

Once it is installed and working, I’ve had no issues with steady-state operations.

Conclusion

For a smaller team or company that doesn’t have all the tools and resources of larger or more technology-driven companies, using WSL is one way to get access to Linux tools that can help improve your day to day life. There are some benefits and some drawbacks that you’ll need to weigh before you decide to do it. If you have any questions, feel free to shout at me on twitter Andy Troiano. You can also give me a follow if you like Data Science, Sports, or R

--

--

Andrew Troiano
The Startup

Data Scientist that is not great at writing profiles. I enjoy baseball and football.