Streamlining Airflow Deployment: Automating CI/CD with GitHub Actions

Rajesh Ku. Rout
8 min readMay 16, 2023

--

Are you tired of manually deploying your Airflow code changes to your EC2 instance? Look no further than GitHub Actions for continuous integration and deployment (CI/CD) automation. In this blog, I’ll walk you through how to set up GitHub Actions to automate the deployment of your Airflow code to your EC2 instance.

I’ll guide you through the steps to set up a GitHub Action workflow for your Airflow code, including creating a new workflow, defining the necessary environment variables, and configuring the deployment steps.

With my detailed instructions and examples, you’ll be able to seamlessly integrate GitHub Actions into your Airflow development process and easily deploy your code changes to your EC2 instance. Say goodbye to manual deployments and hello to more efficient and streamlined development with GitHub Actions.

By following the steps outlined below in this blog, you can create an SSH key pair, securely store your private key as a secret in GitHub, and configure your EC2 instance to accept the public key. With this setup in place, you can leverage the power of GitHub Actions to copy your Airflow code files to your EC2 instance using secure SSH authentication.

Creating an SSH Key Pair

  1. Open up your terminal or command prompt and type ssh-keygen the command.
ssh-keygen command

Follow the prompts to generate a new SSH key pair.

2. Press ENTER in every prompt to create ssh-keygen in the default location.

In case you want to specify a custom location by providing a file path or want to add Passphrase then add it after the respective prompt.

Press ENTER after each prompt for the default location and No passphrase

Adding the Private Key to GitHub Secrets

  1. Go to the repository in which you have kept your Airflow code or want to make that repository for Airflow Code.
  2. Click on “Settings” at the top of the page, then click on “Secrets” in the left-hand sidebar.
Go to your Repository Settings
Click → Secrets and variables

3. Click on Actions New Repository Secret

4. Enter a name for your secret (e.g., SSH_PRIVATE_KEY) and paste the contents of your private key into the "Value" field.

a) Open your Windows Explorer, and paste the below path to get the id_rsa.pub file as shown below.

C:\Users\rajes\.ssh   -- 'rajes' : Replace your username 
Paste the path: C:\Users\rajes\.ssh
Open the id_rsa.pub file

b) Copy the content of the id_rsa.pub file.

Copy the Content of the File

c) Paste the contents of your private key into the “Value” field of the Secret Variable (SSH_PRIVATE_KEY)

Paste your ssh private key → Click Add Secret

d) Click “Add secret” to save the secret.

SSH_PRIVATE_KEY

We have successfully added a repository secret.

Configuring the Public Key on EC2 Instance

  1. Log in to your EC2 instance using SSH.
Login to EC2 using SSH

You can log in to AWS EC2 using different ways. I have done it using the ssh command.

2. Run mkdir ~/.ssh the command to create a new ~/.ssh directory.

Create new folder:- mkdir ~/.ssh

3. Run touch ~/.ssh/authorized_keys the command to create a new authorized_keys file.

4. Open the authorized_keys file with your preferred text editor.

vim authorized_keys        --( I am using VIM editor to edit the file )


-- In case you dont have vim installed, then you can install it by:-

sudo apt install vim --(Change your linux user to root user and install it)
cd .sshls authorized_keys

Copy the ssh public key and paste it into the authorized_keys file on your EC2 instance.

Open the id_rsa file and copy the content and paste it into your EC2 authorized_keys file
Paste it into:- vim authorized_key

To save and close the file, enter

Click on ESC button --> Type :wq! --> Then Enter

5. Set the appropriate file permissions on the ~/.ssh directory and authorized_keys file with the following commands:

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

Configuring GitHub Actions to use SSH Key

  1. Create a new GitHub Action workflow file or modify an existing one that needs to add files to your EC2 instance via SSH.

To create a new workflow file, follow the below steps:

Click on Actions → Click on Set up a workflow yourself

Actions → set up a workflow yourself

Give a name to your workflow file and commit the changes

Give a name to the File → Commit changes
Add a commit message → Commit Changes

Once we commit the changes, we can see a new folder .github/workflows has been created with airflow_env.yml” file inside it as shown below.

2. Create a new folder ‘airflow-scripts’ in your Repository at the root location

→ Click on Add FileCreate New File

We are creating this folder to keep our Airflow Scripts

Add File → Create new File

Enter airflow-scripts/Sample_ETL_Dag.py

We have created a new folder and added a new file named “Sample_ETL_Dag.py”

Then commit the new changes.

3. Copy and paste the following code snippet into the airflow_env.yml workflow file:

name: Deploy to EC2 Instance 


# Controls when the action will run.

on:
push:

branches:
- main

paths:
- 'airflow-scripts/**' # this is our Folder name we created for Airflow

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:
deploy:
runs-on: ubuntu-latest

steps:


- name: Checkout repository
uses: actions/checkout@v2

- name: copy file via ssh key
uses: appleboy/scp-action@v0.1.4
with:
host: ec2-11-111-111-111.compute-1.amazonaws.com # Your EC2 Host name
username: airflow # Your EC2 Username
key: ${{ secrets.SSH_PRIVATE_KEY }} # Secret Variable we created
source: Sample_ETL_Dag.py # This is our Source File name placed in Airflow-scripts Folder
target: /home/airflow/airflow/dags/ # Your target folder in EC2 where you want to create the file

The given code is a GitHub Actions workflow named “Deploy to EC2 Instance” that is triggered under specific conditions and performs a file copy operation to an EC2 instance using SSH key authentication. Here is an explanation of the code in a structured format:

on:

  • push: Specifies that the workflow will run when a push occurs in the specified branches.
  • branches: Defines the branch name(s) for which the workflow will be triggered (e.g., main).
  • paths: Specifies the folder path(s) that, if modified, will trigger the workflow (e.g., 'airflow-scripts/**').
  • workflow_dispatch: Allows manual triggering of the workflow from the Actions tab in the GitHub repository.

jobs:

  • deploy: Defines a job named “deploy” that will run the steps within it.
  • runs-on: Specifies the operating system for the job (e.g., ubuntu-latest).

steps:

  • Checkout repository: Uses the actions/checkout action to fetch the repository content.
  • copy file via ssh key: Uses the appleboy/scp-action action to copy a file to the EC2 instance via SSH.
  • host: Specifies the EC2 instance hostname or IP address.
  • username: Specifies the username used to authenticate with the EC2 instance.
  • key: Refers to the SSH_PRIVATE_KEY secret variable created in GitHub Secrets.
  • source: Specifies the source file to be copied, with the branch name appended as a suffix.
  • target: Specify the target folder on the EC2 instance where the file will be created.

By configuring this workflow, any changes pushed to the specified branch ( main ) or folder ( airflow_scripts ) will trigger the workflow. It will then copy the specified file to the target folder on the EC2 instance using SSH key authentication.

Certainly! When you push something into the airflow-scripts folder, the workflow will be triggered, and it will copy the sample_ETL_Dag.py file to the specified target folder on your EC2 instance.

Deploy to EC2 Instance Workflow is running

Once, it's successfully executed, a copy of the sample_ETL_Dag.py file will be created at the specified target folder on your EC2 instance as shown below.

This workflow allows for seamless deployment of the Airflow DAG file to the remote server for execution. By automating this process, you can ensure that the latest version of your DAG is always available on the EC2 instance, enabling efficient and up-to-date task scheduling and workflow management in Airflow.

In conclusion, utilizing GitHub Actions for CI/CD of Airflow code in an EC2 instance can greatly streamline your development process. By automating the deployment of your Airflow code changes, you can save valuable time and effort that would otherwise be spent on manual deployments. GitHub Actions provides a powerful and flexible platform for creating workflows that integrate seamlessly with your repository, allowing you to trigger deployments based on specific events, such as pushes to relevant branches or modifications to specific folders.

Embrace the potential of GitHub Actions for CI/CD of Airflow code in your EC2 instance, and unlock a world of streamlined development and efficient deployment, empowering you to take your Airflow projects to new heights.

As an aspiring data engineer, take your skills to the next level with the Udemy SnowPro Core Certification Practice Set for 2023. Designed to help you excel in the SnowPro Core Certification exam, this practice set provides real-world scenarios and hands-on exercises to enhance your understanding of Snowflake data warehousing. Don’t miss this opportunity to gain practical experience, reinforce your knowledge, and pave the way for a successful career in data engineering. Enroll today and unlock your full potential!

https://www.udemy.com/course/snowflake-snowpro-core-certification-2023-practice-sets/?referralCode=D9755936C300A61FA7A4

--

--

Rajesh Ku. Rout

Apache Airflow Global Champion Snowflake Squad Member 2024 --- Data Scientist - www.linkedin.com/in/rajeshrout97