Setting up Apache-Airflow in Ubuntu

(A Step-by-Step Guide to Setup Apache-Airflow on Linux)

Mukesh Kumar
Accredian
5 min readAug 12, 2022

--

Preface

Previously, you observed and learned a high-level overview of Data and Data Pipelines, followed by an Introduction to Apache Airflow. Moreover, I also explained the setup of Apache-Airflow on Windows 10 using WSL 2. However, that’s not the stop. Continuing the series, in this story, I will walk you through the installation of Apache Airflow on Ubuntu using a python-based virtual environment.

If you are a Linux user, scroll down and proceed with the setup instructions. You can also follow my story on setup for Windows and macOS (following story) because it is necessary to meet the basic requirements before I start giving more information on using Apache-Airflow.

Please note that the setup instructions will work for both OS, Ubuntu 18.04 and Ubuntu 20.04.

Installation of pip on Ubuntu

  • To set up a virtual environment, we need to install a python package named virtualenv.
  • We will use the pip command for the same.
  • If the python-pip is not available, run the following command in an Ubuntu terminal:
username@desktop_name:~$ sudo apt install python3-pip
[sudo] password for username:
  • Type the Ubuntu password to proceed with the installation.

Installing & Setting Up a Virtual Environment

  • After successfully installing pip, we will now install the virtualenv package using the following command:
username@desktop_name:~$ sudo pip3 install virtualenv
[sudo] password for username:
OUTPUT:
Collecting virtualenv
  • Let’s create a new directory (say airflow_workspace) that will contain the virtual environment directory (which we will create next) and the airflow directory (create it manually).
  • To create a virtual environment directory as “airflow_env” inside the “airflow_workspace” directory, execute the following command:
username@desktop_name:~/airflow_workspace$ virtualenv airflow_env
OUTPUT:
created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator
airflow_workspace after creating the virtual environment (airflow_env) directory and airflow directory
  • To activate the environment use the following command:
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
  • You will observe that our virtual environment name precedes the username on the terminal, as shown below:
(airflow_env) username@desktop_name:~/airflow_workspace$
  • It indicates that we have successfully activated the virtual environment.

Installation of Airflow & Essential Libraries

  • Next, we will install airflow and some additional libraries using the following command:
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install apache-airflow[gcp,sentry,statsd]
OUTPUT:
Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • The installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install pyspark
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install sklearn

Initialization of Airflow Database

  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • You will observe some new files and directories inside the airflow directory, as shown below in the image.
Airflow Directory after the ‘airflow db init’ command
  • It is time to create a dags folder. All the future dags will be stored here and accessed by the airflow components.
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ mkdir dags

Creating a New Airflow User

  • Developers must create a new user on the first startup of airflow.
  • It can be done with the help of the “users create” command.
  • To create a new user with a username as admin with Admin role, we can run the following code:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@domain.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users list
OUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Running of the Airflow Scheduler and Webserver

  • Now we will start the airflow scheduler using the airflow scheduler command after activating the virtual environment:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow scheduler
  • Open a new terminal, activate the virtual environment, go to the airflow directory, and start the web server.
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow webserver
  • Once the scheduler and webserver get initialized, open any browser and go to http://localhost:8080/.
  • Port 8080 should be the default port for Airflow, and you see the following page:
  • If it doesn’t work or shows occupied by some other program, go to airflow.cfg file and change the port number.
  • After logging in using our airflow username and password, we should see the following web server UI.
  • These are some prebuilt dags you will observe when you log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dogs, and see their workings.

& That’s it. I hope you liked the explanation of Setting up Apache-Airflow in Ubuntu and learned something valuable. Please let me know in the comment section if you have anything to share with me. I would love to know your thoughts.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their journey in Computer Science, Data Science and AI. If you are one of them and looking for a way to counterbalance these cons, then Follow me and Subscribe for more forthcoming articles related to Python, Computer Science, Data Science, Machine Learning, and Artificial Intelligence.

If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more cool stuff like this.

What’s next?

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.