Setting up Apache-Airflow in Windows using WSL 2

(A Step-by-Step Guide to Setup Apache-Airflow on Windows 10)

Mukesh Kumar
Accredian
6 min readAug 9, 2022

--

Written in collaboration with Hiren Rupchandani

Preface

In the previous story, you learned to set up Ubuntu 20.04 on Windows 10 as Linux Subsystem Distribution. In this article, I will walk you through the installation process of Apache Airflow in WSL 2 using a virtual environment.

Installation of pip on WSL 2

  • To set up a virtual environment, we need to install a python package named virtualenv.
  • We will use the pip command for the same.
  • If the python-pip is not available in your Linux distribution, execute the following command in an Ubuntu terminal:
username@desktop_name:~$ sudo apt install python3-pip
[sudo] password for username:
  • Type in the Linux password to proceed with the installation.

Installation of virtualenv package

After successfully installing pip, you will now have to install the virtualenv package using the following command:

username@desktop_name:~$ sudo pip3 install virtualenv
[sudo] password for username:
OUTPUT:
Collecting virtualenv

Creation of Virtual Environment

  • We will now create a virtual environment.
  • It will keep its libraries and dependencies separate from the global and any other project libraries to avoid any conflict between them.
  • We can create a virtual environment in WSL using the following command:
username@desktop_name:~$ virtualenv airflow_envOUTPUT:
created virtual environment CPython3.8.10.final.0–64 in 841ms
creator CPython3Posix(dest=/home/username/airflow_env, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/username/.local/share virtualenv)
added seed packages: pip==21.2.2, setuptools==57.4.0, wheel==0.36.2
activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator
  • To activate the environment execute the following code:
username@desktop_name:~$ source airflow_env/bin/activate
  • You should now see that the virtual environment name precedes the terminal command line like:
(airflow_env) username@desktop_name:~$
  • It indicates that the virtual environment has been activated, and all the upcoming code instructions will take effect only inside this environment.

Setting Up an Environment Variable for the Airflow Directory

  • Next, we will set up an environment variable to help us navigate the airflow directory every time we restart the system.
  • We will first create a directory named airflow on Windows. Our directory will be located at C:/Users/username/Documents/.
  • Next, we will get the directory path in WSL using the command line, and then we will set this path in our WSL environment.
  • Generally, we find the system drives in the mnt folder. And Documents directory is located under our username directory in the c/Users directory.
  • We will now travel to the airflow directory from the root directory using the following commands:
username@desktop_name:~$ cd /
username@desktop_name:/$ ls
OUTPUT:
bin dev etc g init lib32 libx32 media opt root sbin srv tmp var
boot home lib lib64 lost+found mnt proc run snap sys
usrusername@desktop_name:/$ cd mntusername@desktop_name:/mnt$ cd cusername@desktop_name:/mnt/c$ lsOUTPUT:
'$Recycle.Bin' AVScanner.ini MSOCache 'Program Files (x86)' Users pagefile.sys '$WinREAgent' Config.Msi Octave ProgramData Windows swapfile.sys 'Documents and Settings' PerfLogs Recovery temp tmp 'Program Files' 'System Volume Information' hiberfil.sys tools
username@desktop_name:/mnt/c$ cd Usersusername@desktop_name:/mnt/c/Users$ lsOUTPUT:
'All Users' Default 'Default User' username Public desktop.ini
username@desktop_name:/mnt/c/Users$ cd usernameusername@desktop_name:/mnt/c/Users/username$ lsOUTPUT:
OneDrive Pictures PrintHood Anaconda3 Recent AppData Searches 'Application Data' myWebApp 'Creative Cloud Files' Templates Desktop Documents Downloads MicrosoftEdgeBackups Music 'My Documents'
username@desktop_name:/mnt/c/Users/username$ cd Documentsusername@desktop_name:/mnt/c/Users/username/Documents$ lsOUTPUT:
Zoom airflow desktop.ini
username@desktop_name:/mnt/c/Users/username/Documents$ cd airflowusername@desktop_name:/mnt/c/Users/username/Documents/airflow$
  • Next, copy this entire path of airflow directory- /mnt/c/Users/username/Documents/airflow and store it in a variable.
  • To create an environment variable, open a new terminal and enter the following command to edit the bash script:
username@desktop_name:~$ sudo nano ~/.bashrc
  • You have to enter the following instruction anywhere in the given console:
export AIRFLOW_HOME=/mnt/c/Users/username/Documents/airflow

Entering the export command in the console

  • This command will save the airflow directory path in an environment variable named AIRFLOW_HOME. Close all the open terminals.
  • Next time you start the terminal, you can write cd $AIRFLOW_HOME to go to the airflow directory in one go.
username@desktop_name:~$ cd $AIRFLOW_HOMEusername@desktop_name:~/mnt/c/Users/username/Documents/airflow$

Installation of Airflow & Essential Libraries

  • Now, you need to open a new terminal and activate the virtual environment using the following command:
username@desktop_name:~$ source airflow_env/bin/activate
  • To install the airflow and the essential libraries, execute the following command:
(airflow_env) username@desktop_name:~ pip3 install apache-airflow[gcp,sentry,statsd]OUTPUT:
Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • The installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, you may also require to install some additional libraries like sklearn and pyspark.
(airflow_env) username@desktop_name:~ pip3 install pyspark
(airflow_env) username@desktop_name:~ pip3 install sklearn

Initialization of Airflow Database

  • Next, you will be required to go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@desktop_name:~ cd $AIRFLOW_HOME
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • If you execute the following command, you will observe some files and directories inside the airflow directory.
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ lsOUTPUT:
airflow.cfg airflow.db logs webserver_config.py
  • It is time to create a dags folder. All the future dags will be stored here and accessed by the airflow components.
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ mkdir dags

Creating a New Airflow User

  • Developers must create a new user on the first startup of airflow.
  • It can be done with the help of the “users create” command.
  • To create a new user with a username as admin with Admin role, you can run the following code:
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Running of the Airflow Scheduler and Webserver

  • Now start the airflow scheduler using the airflow scheduler command after activating the virtual environment:
username@desktop_name:~$ source airflow_env/bin/activate(airflow_env) username@desktop_name:~$ cd $AIRFLOW_HOME(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow scheduler
  • Open a new terminal, activate the virtual environment, go to the airflow directory, and start the web server.
username@desktop_name:~$ source airflow_env/bin/activate(airflow_env) username@desktop_name:~$ cd $AIRFLOW_HOME(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow webserver
  • Once the scheduler and webserver get initialized, open any browser and go to http://localhost:8080/.
  • Port 8080 should be the default port for Airflow, and you see the following page:
Airflow Login
  • If it doesn’t work or shows occupied by some other program, go to airflow.cfg file and change the port number.
  • After logging in using our airflow username and password, we should see the following web server UI.
Airflow Home Page
  • These are some prebuilt dags you will observe when you log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dogs, and see their workings.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science because this one will cover your foundations plus machine learning algorithms (basic to advance).

& That’s it. I hope you liked the explanation of Setting up Apache-Airflow in Windows using WSL and learned something valuable. Please let me know in the comment section if you have anything to share with me. I would love to know your thoughts.

Follow me for more forthcoming articles based on Python, R, Data Science, Machine Learning, and Artificial Intelligence.

If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more valuable content.

What’s next?

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.