Streamlining Airflow Deployment: Automating CI/CD with GitHub Actions
Are you tired of manually deploying your Airflow code changes to your EC2 instance? Look no further than GitHub Actions for continuous integration and deployment (CI/CD) automation. In this blog, I’ll walk you through how to set up GitHub Actions to automate the deployment of your Airflow code to your EC2 instance.
I’ll guide you through the steps to set up a GitHub Action workflow for your Airflow code, including creating a new workflow, defining the necessary environment variables, and configuring the deployment steps.
With my detailed instructions and examples, you’ll be able to seamlessly integrate GitHub Actions into your Airflow development process and easily deploy your code changes to your EC2 instance. Say goodbye to manual deployments and hello to more efficient and streamlined development with GitHub Actions.
By following the steps outlined below in this blog, you can create an SSH key pair, securely store your private key as a secret in GitHub, and configure your EC2 instance to accept the public key. With this setup in place, you can leverage the power of GitHub Actions to copy your Airflow code files to your EC2 instance using secure SSH authentication.
Creating an SSH Key Pair
- Open up your terminal or command prompt and type
ssh-keygen
the command.
Follow the prompts to generate a new SSH key pair.
2. Press ENTER in every prompt to create ssh-keygen in the default location.
In case you want to specify a custom location by providing a file path or want to add Passphrase then add it after the respective prompt.
Adding the Private Key to GitHub Secrets
- Go to the repository in which you have kept your Airflow code or want to make that repository for Airflow Code.
- Click on “Settings” at the top of the page, then click on “Secrets” in the left-hand sidebar.
3. Click on Actions → New Repository Secret
4. Enter a name for your secret (e.g., SSH_PRIVATE_KEY
) and paste the contents of your private key into the "Value" field.
a) Open your Windows Explorer, and paste the below path to get the id_rsa.pub file as shown below.
C:\Users\rajes\.ssh -- 'rajes' : Replace your username
b) Copy the content of the id_rsa.pub file.
c) Paste the contents of your private key into the “Value” field of the Secret Variable (SSH_PRIVATE_KEY
)
d) Click “Add secret” to save the secret.
We have successfully added a repository secret.
Configuring the Public Key on EC2 Instance
- Log in to your EC2 instance using SSH.
You can log in to AWS EC2 using different ways. I have done it using the ssh command.
2. Run mkdir ~/.ssh
the command to create a new ~/.ssh
directory.
3. Run touch ~/.ssh/authorized_keys
the command to create a new authorized_keys
file.
4. Open the authorized_keys
file with your preferred text editor.
vim authorized_keys --( I am using VIM editor to edit the file )
-- In case you dont have vim installed, then you can install it by:-
sudo apt install vim --(Change your linux user to root user and install it)
Copy the ssh public key and paste it into the authorized_keys
file on your EC2 instance.
To save and close the file, enter
Click on ESC button --> Type :wq! --> Then Enter
5. Set the appropriate file permissions on the ~/.ssh
directory and authorized_keys
file with the following commands:
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
Configuring GitHub Actions to use SSH Key
- Create a new GitHub Action workflow file or modify an existing one that needs to add files to your EC2 instance via SSH.
To create a new workflow file, follow the below steps:
Click on Actions → Click on Set up a workflow yourself
Give a name to your workflow file and commit the changes
Once we commit the changes, we can see a new folder “.github/workflows” has been created with “airflow_env.yml” file inside it as shown below.
2. Create a new folder ‘airflow-scripts’ in your Repository at the root location
→ Click on Add File → Create New File
We are creating this folder to keep our Airflow Scripts
Enter airflow-scripts/Sample_ETL_Dag.py
We have created a new folder and added a new file named “Sample_ETL_Dag.py”
Then commit the new changes.
3. Copy and paste the following code snippet into the airflow_env.yml workflow file:
name: Deploy to EC2 Instance
# Controls when the action will run.
on:
push:
branches:
- main
paths:
- 'airflow-scripts/**' # this is our Folder name we created for Airflow
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: copy file via ssh key
uses: appleboy/scp-action@v0.1.4
with:
host: ec2-11-111-111-111.compute-1.amazonaws.com # Your EC2 Host name
username: airflow # Your EC2 Username
key: ${{ secrets.SSH_PRIVATE_KEY }} # Secret Variable we created
source: Sample_ETL_Dag.py # This is our Source File name placed in Airflow-scripts Folder
target: /home/airflow/airflow/dags/ # Your target folder in EC2 where you want to create the file
The given code is a GitHub Actions workflow named “Deploy to EC2 Instance” that is triggered under specific conditions and performs a file copy operation to an EC2 instance using SSH key authentication. Here is an explanation of the code in a structured format:
on:
- push: Specifies that the workflow will run when a push occurs in the specified branches.
- branches: Defines the branch name(s) for which the workflow will be triggered (e.g.,
main
). - paths: Specifies the folder path(s) that, if modified, will trigger the workflow (e.g.,
'airflow-scripts/**'
). - workflow_dispatch: Allows manual triggering of the workflow from the Actions tab in the GitHub repository.
jobs:
- deploy: Defines a job named “deploy” that will run the steps within it.
- runs-on: Specifies the operating system for the job (e.g.,
ubuntu-latest
).
steps:
- Checkout repository: Uses the
actions/checkout
action to fetch the repository content. - copy file via ssh key: Uses the
appleboy/scp-action
action to copy a file to the EC2 instance via SSH. - host: Specifies the EC2 instance hostname or IP address.
- username: Specifies the username used to authenticate with the EC2 instance.
- key: Refers to the
SSH_PRIVATE_KEY
secret variable created in GitHub Secrets. - source: Specifies the source file to be copied, with the branch name appended as a suffix.
- target: Specify the target folder on the EC2 instance where the file will be created.
By configuring this workflow, any changes pushed to the specified branch ( main ) or folder ( airflow_scripts ) will trigger the workflow. It will then copy the specified file to the target folder on the EC2 instance using SSH key authentication.
Certainly! When you push something into the airflow-scripts
folder, the workflow will be triggered, and it will copy the sample_ETL_Dag.py
file to the specified target folder on your EC2 instance.
Once, it's successfully executed, a copy of the sample_ETL_Dag.py
file will be created at the specified target folder on your EC2 instance as shown below.
This workflow allows for seamless deployment of the Airflow DAG file to the remote server for execution. By automating this process, you can ensure that the latest version of your DAG is always available on the EC2 instance, enabling efficient and up-to-date task scheduling and workflow management in Airflow.
In conclusion, utilizing GitHub Actions for CI/CD of Airflow code in an EC2 instance can greatly streamline your development process. By automating the deployment of your Airflow code changes, you can save valuable time and effort that would otherwise be spent on manual deployments. GitHub Actions provides a powerful and flexible platform for creating workflows that integrate seamlessly with your repository, allowing you to trigger deployments based on specific events, such as pushes to relevant branches or modifications to specific folders.
Embrace the potential of GitHub Actions for CI/CD of Airflow code in your EC2 instance, and unlock a world of streamlined development and efficient deployment, empowering you to take your Airflow projects to new heights.
As an aspiring data engineer, take your skills to the next level with the Udemy SnowPro Core Certification Practice Set for 2023. Designed to help you excel in the SnowPro Core Certification exam, this practice set provides real-world scenarios and hands-on exercises to enhance your understanding of Snowflake data warehousing. Don’t miss this opportunity to gain practical experience, reinforce your knowledge, and pave the way for a successful career in data engineering. Enroll today and unlock your full potential!