How to Install Custom Image Airflow on Docker (Ubuntu)
Introduction
Apache Airflow is a powerful platform used for orchestrating and scheduling complex workflows. I’m installing because it provides containerization, which ensures that Airflow and its dependencies are isolated from the host system and other containers. This helps in avoiding conflicts and ensures that the Airflow environment remains consistent and isolated. In my case, I have to customize image Apache Airflow because I need some python libraries to be installed as well.
In this tutorial, I will show the steps to install a custom Apache Airflow image on Docker in Ubuntu.
Prerequisites
- Ubuntu operating system installed on your machine.
- Text editor such a VS Code
- Docker installed and running on Ubuntu system. You can follow the official Docker installation guide for Ubuntu (link: https://docs.docker.com/engine/install/ubuntu/) to set up Docker correctly.
- Create /airflow directory and inside /airflow directory create some directory such a /dags, /config, /logs, /plugins.
2. Create requirements.txt
file, which contains the required libraries
pandas
google-api-python-client
oauth2client==4.1.3
google-api-python-client==2.45.0
google-cloud-bigquery==3.2.0
google-cloud-storage==2.5.0
fastavro==1.6.1
gcsfs==2022.10.0
yfinance==0.2.22
3. Create a Dockerfile
contains instructions for building custom Airflow image
FROM apache/airflow:2.6.2
COPY /dags ./dags
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
USER airflow
EXPOSE 8080
EXPOSE 8793
EXPOSE 5555
4. Build the Docker image by terminal on vscode, with this command:
docker build -t apache-airflow:aldi .
This command builds the Docker image based on the Dockerfile
in the current directory and tags it with the name apache-airflow:aldi
5. Fetch docker-compose.yaml
with this command:
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.2/docker-compose.yaml'
If you want to know more about docker-compose.yaml
with please refer the original documentation from Apache Airflow here
6. Modify docker-compose.yaml
replace image name to docker image name that has been build before
7. Execute docker-compose.yaml
with this command:
docker-compose up -f docker-compose.yaml
8. After the deployment finished, verify the status of the containers using the docker ps
command:
docker ps
installation Apache Airflow is done, and can view the airflow UI (webserver) by visiting http://localhost:8080
in your web browser.
Conclusion
In this tutorial, I learned how to install a custom Apache Airflow image on Docker in Ubuntu. By containerizing Airflow, can easily manage and deploy workflows in a consistent and reproducible manner. Docker provides an efficient and portable solution for running Airflow, making it a popular choice among developers and data engineers.
That’s for the article, free to connect with me on LinkedIn for any further questions.