How To Install Apache Airflow

Jackson Bull
4 min readJun 20, 2020

If you’re an experienced data engineer or just getting started in the industry, you’ve probably heard of Apache Airflow, but what is it? Airflow is a workflow management tool that helps automate complex tasks in your data pipeline (also known as Extract. Transform. Load. or ETL for you acronym lovers). It’s a valuable skill to have but can be difficult to learn how to use, let alone setting up on your local pc. Explaining how to use Airflow could fill a 500-page book, therefore I am dedicating this post to showing you how to install it onto your Windows 10.

Step 1 — Installing Ubuntu

What is Ubuntu, you ask? Well, Ubuntu is a a linux operating system that allows you to more easily navigate through your desktop. Most people are used to a nice-looking graphical user interface (GUI) to organize your folders and files, but Ubuntu, although it looks intimidating, gives you more control over your desktop — plus using it makes you feel like you’re in the Matrix! Before you install it though, first make sure that developer mode is activated by opening up “Developer Settings” and selecting the “Developer Mode” option.

Search>Developer Settings>select Developer Mode

Secondly, enable the subsystem for Linux option located in Windows Features.

Enabling Windows Subsystem for Linux

Lastly, download the Visual C++ Build Tools

Now you’re ready to install Ubuntu. After installing it from the Microsoft Store, a terminal will automatically open prompting you to enter a username and a password. FYI, when you start typing your password, you may not see an characters populating in the terminal — don’t worry, your computer isn’t broken, just trust your finger tips.

If you end up closing and reopening the terminal after you’re finished with this last step, just type “bash” to activate Ubuntu. Bash is the command-language that allows you to communicate with your computer.

Step 2 — Installing Pip

Pip is a management system designed to install software packages written in Python. It’s what you’ll need to download Apache Airflow. Run through the following code commands to implement this step:

sudo apt-get install software-properties-commonsudo apt-add-repository universesudo apt-get updatesudo apt-get install python-setuptoolssudo apt install python3-pipsudo -H pip3python install --upgrade pip

Step 3 — Install Airflow Dependencies

Before installing Apache Airflow, you’ll need to run the following commands to make sure the necessary dependencies are installed. Airflow uses sqlite as its default database, however, if you want to use something more scalable like PostgreSQL then check out Airflow’s documentation. But if you want to just learn the basics without too much complexcity, then continue to step 4.

sudo apt-get install libmysqlclient-devsudo apt-get install libssl-devsudo apt-get install libkrb5-dev

Step 4 — Installing Apache Airflow

Finally, now this is what we’re here for!! Just run through the following lines of code and you should be all set.

export AIRFLOW_HOME=~/airflowpip3 install apache-airflowpip3 install typing_extensions# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080
# start the scheduler. I recommend opening up a separate terminal #window for this step
airflow scheduler

# visit localhost:8080 in the browser and enable the example dag in the home page
Apache Airflow — User Interface

And there you have it! You can now start playing around this incredibly powerful tool. Hopefully I reduced the headaches of trying to get it installed so you can save your energy for more productive tasks.

--

--

Jackson Bull

Data Scientist, Analyst | Enjoy discovering new music