Setting up Airflow on WSL: Breathe Easy with Data Workflows (No Docker Required!)
So, you want to use Airflow on your Windows machine but don’t want to go through the hassle of setting up a separate Linux server? Fear not, my friend! With WSL, you can easily install and configure Airflow in just a few simple steps.
Prerequisites
Before we begin, make sure you have the following prerequisites installed on your system:
If you’re unsure whether you have these prerequisites installed, you can check by running the following commands in your terminal:
python3 --version
pip3 --version
If any of these commands return an error, you’ll need to install the missing prerequisite before continuing.
Step 1: Get your fingers ready
First things first, make sure you have all your fingers ready to type because we’re about to do some serious command-line action.
Step 2: Update and upgrade
Before we get started, let’s make sure your system is up-to-date. Open up your terminal or WSL shell and run the following command:
sudo apt update && sudo apt upgrade
This will ensure that your system is up-to-date and ready to install Airflow.
Step 3: Create Airflow Home
Next, we need to create a directory where Airflow can store its files. For example, let’s create a directory called AirflowHome
in the /c/Users/username
directory. You can create this directory by running the following command:
mkdir /c/Users/username/AirflowHome
Don’t forget to replace username
with your actual Windows username.
Step 4: Install Airflow
To install Airflow, simply run the following command in your terminal:
pip3 install apache-airflow
This will install the latest version of Airflow and all its dependencies.
Step 5: Set the AIRFLOW_HOME environment variable
Now that we have our AirflowHome
directory, let's set an environment variable called AIRFLOW_HOME
to point to this directory. To do this, open up your terminal and type:
nano ~/.bashrc
This will open up a text editor where you can define environment variables. Add the following line at the end of the file:
export AIRFLOW_HOME=/c/Users/username/AirflowHome
Don’t forget to replace username
with your actual Windows username.
Save the changes by pressing Ctrl+X
, then Y
to confirm, and finally Enter
.
Step 6: Make directories great again
By default, WSL uses the /mnt
directory to access your Windows files, which can be a bit annoying. So, let's make things easier by adding the following lines to /etc/wsl.conf
:
sudo nano /etc/wsl.conf
Then add the following lines:
[automount]
root = /
options = "metadata"
Save the changes by pressing Ctrl+X
, then Y
to confirm, and finally Enter
.
Step 7: Close and open
Close your terminal and open it again to make sure everything is refreshed.
Step 8: Install missing packages
If you run into any missing package errors, just use pip3 to install them:
pip3 install [package-name]
Step 9: You made it!
Congrats! You’ve successfully installed Airflow on WSL. To make sure everything is working, run the following command:
airflow info
If you see something like Apache Airflow [2.x.x], you’re good to go!
Step 10: Initialize Airflow database
Before we can use Airflow, we need to initialize its database. Run the following command in your terminal:
airflow db init
This will create the necessary tables and structures for Airflow to store and manage its data.
Step 11: Start Airflow
Finally, we’re ready to start Airflow. Open up your terminal and run the following command:
airflow webserver
This will start the Airflow web server and make it available at http://localhost:8080
. Open up your web browser and navigate to this URL to access the Airflow web interface.
Congratulations, you’ve successfully set up Airflow on your WSL environment! Now, you can use Airflow to manage your data workflows and pipelines.
But wait, there’s more!
Bonus: Scheduler and Command Line Interface (CLI)
Now that you have Airflow up and running, let’s take a quick look at some additional features. You can start the Airflow scheduler by running the following command in a new terminal:
airflow scheduler
The scheduler is responsible for triggering your DAGs and running your tasks according to their schedule. With the scheduler running, you can use the Airflow CLI to interact with your DAGs and tasks. For example, you can list your DAGs by running:
airflow list_dags
This will display a list of all the DAGs you’ve defined in your dags_folder
.
Step 12: Get creative
Now that you’ve set up Airflow, it’s time to get creative! Use Airflow’s Python API or web UI to create and manage your workflows and tasks. And don’t forget to have fun with it!
Conclusion
Setting up Airflow on WSL doesn’t have to be a tedious task. With just a few simple commands, you can easily install and configure Airflow on your Windows machine. So, grab a coffee (or your favorite beverage), get your fingers ready, and start creating some awesome workflows with Airflow!