Setting up Apache-Airflow in macOS

(A Step-by-Step Guide to Setup Apache-Airflow on macOS)

Mukesh Kumar
Accredian

--

Written in collaboration with Hiren Rupchandani

Preface

In the previous stories, you observed and learned a high-level overview of Data and Data Pipelines, followed by an Introduction to Apache Airflow. Continuing the Apache-Airflow series next, I will walk you through the installation of Apache Airflow on a macOS utilizing a python-based virtual environment.

I assume that you have homebrew installed on your system. If not, you can refer to this link to install homebrew.

Installation of pip on macOS

  • If the python-pip is not installed, run the following command in the Mac terminal:
username@device_name ~ % brew install pip

Installing & Setting Up a Virtual Environment

  • After successfully installing pip, we will now install the virtualenv package using the following command:
username@device_name~ % sudo pip3 install virtualenv
  • Let’s create a new directory (say airflow_workspace) that will contain the virtual environment directory (which we will create next) and the airflow directory (create it manually).
  • To create a virtual environment directory as “airflow_env” inside the “airflow_workspace” directory, execute the following command:
username@device_name~/airflow_workspace % virtualenv airflow_envOUTPUT:
created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator
airflow_workspace after creating the virtual environment (airflow_env) directory and airflow directory
  • To activate the environment use the following command:
username@device_name~/airflow_workspace % source airflow_env/bin/activate
  • You will observe that our virtual environment name precedes the username on the terminal, as shown below:
(airflow_env) username@device_name~/airflow_workspace %
  • It indicates that we have successfully activated the virtual environment.

Installation of Airflow & Essential Libraries

  • Next, we will install airflow and some additional libraries using the following command:
(airflow_env) username@device_name~/airflow_workspace % pip3 install apache-airflow[gcp,sentry,statsd]OUTPUT:
Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • The installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
(airflow_env) username@device_name~/airflow_workspace % pip3 install pyspark
(airflow_env) username@device_name~/airflow_workspace % pip3 install sklearn

Initialization of Airflow Database

  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@device_name~/airflow_workspace % cd airflow
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • You will observe some new files and directories inside the airflow directory, as shown below in the image.
  • It is time to create a dags folder. All the future dags will be stored here and accessed by the airflow components.
(airflow_env) username@device_name~/airflow_workspace/airflow % mkdir dags

Creating a New Airflow User

  • Developers must create a new user on the first startup of airflow.
  • It can be done with the help of the “users create” command.
  • To create a new user with a username as admin with Admin role, we can run the following code:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Running of the Airflow Scheduler and Webserver

  • Now we will start the airflow scheduler using the airflow scheduler command after activating the virtual environment:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow scheduler
  • Open a new terminal, activate the virtual environment, go to the airflow directory, and start the web server.
username@device_name~/airflow_workspace source airflow_env/bin/activate
(airflow_env) username@device_name~/airflow_workspace % cd airflow
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow webserver
  • Once the scheduler and webserver get initialized, open any browser and go to http://localhost:8080/.
  • Port 8080 should be the default port for Airflow, and you see the following page:
Airflow Login Page
  • If it doesn’t work or shows occupied by some other program, go to airflow.cfg file and change the port number.
  • After logging in using our airflow username and password, we should see the following web server UI.
Airflow Home Page
  • These are some prebuilt dags you will observe when you log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dogs, and see their workings.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science because this one will cover your foundations plus machine learning algorithms (basic to advance).

& That’s it. I hope you liked the explanation of Setting up Apache-Airflow in macOS and learned something valuable. Please let me know in the comment section if you have anything to share with me. I would love to know your thoughts.

Follow me for more forthcoming articles based on Python, R, Data Science, Machine Learning, and Artificial Intelligence.

If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more valuable content.

What’s next?

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.