Upgrading Your Airflow 1/Composer 1 Environment to Airflow 2/Composer 2: A Comprehensive Migration Guide

jana_om
Google Cloud - Community
12 min readJan 3, 2024

A few months ago, Google Cloud sent an email to all active users of Cloud Composer 1:

“The final version of Cloud Composer 1 is nearing the end of support. Migrate to Cloud Composer 2 by March 25, 2024 for continued support.

On March 25, 2024, the last released Composer 1 version, Composer 1.20.12, will reach its planned end of support according to the versioning model. As of this date, Cloud Composer 2 will be the only General Availability product line and will continue to get new features and updates.

While your Cloud Composer 1 environments will continue to be available after their end of support, Cloud Support or Cloud Composer teams will not be able to support any potential issues in these environments.”

In this article, I will outline the steps involved in migrating to Airflow 2/Composer 2 and discuss the outcomes of the migration.

It’s important to note that Google Cloud provides extensive documentation with detailed explanations, which has proven to be incredibly useful. However, even with these helpful resources, unexpected challenges can still arise during the migration process. Despite these unforeseen issues, my personal experience with Google Cloud has been overwhelmingly positive, reinforcing my preference for working with GCP.

Select your migration strategy

If you are currently running Composer in a production environment, it is important to ensure that the migration process is as smooth and efficient as possible.

Fortunately, Google Cloud offers various migration guides in their documentation, allowing you to choose the best option that aligns with your project’s specifications and the number of DAGs involved. By following the appropriate migration guide, you can minimize any potential disruptions and seamlessly transition to the new environment.

In this article, I will discuss the migration process using snapshots, which, in my opinion, is the most efficient approach. When migrating a project with 20 DAGs, the preparation phase may require some time. However, the actual migration, which involves taking a snapshot of the existing environment and loading it into the newly created Composer 2 environment, typically takes around 2 hours. By diligently following all the necessary preparation steps, you can minimize the risk of encountering any critical issues during the migration process.

Step 1: Upgrade to Airflow 1.10.15

To facilitate the migration, it is recommended to upgrade your environment if you are currently using an Airflow version older than 1.10.15. You can achieve this by transitioning to a Cloud Composer version that incorporates Airflow 1.10.15 and supports snapshots. The upgrade process can be initiated through the console by navigating to the “Environment configuration” section within Composer. Once your Composer image is set to “composer-1.20.12-airflow-1.10.15”, you are well-prepared for the migration. An important detail to keep in mind is that the Composer image includes PyPI packages, which will be very important as we progress with the migration and handle Airflow operators and backport packages.

Step 2: Check compatibility with Airflow 2

Before proceeding with the migration to Composer 2, it’s critical to identify the components and configurations that need to be adjusted. A crucial aspect to keep in mind is that Composer 2 runs on Airflow 2, which means that any DAGs using “contrib” imports, specific to Airflow 1, will need to be updated. To ensure a smooth transition, it’s essential to review your existing DAGs and make the necessary adjustments to bring them in line with Airflow 2 requirements.

Fortunately, Google Cloud has a near-perfect solution to this challenge: the upgrade check script.

gcloud composer environments run \
COMPOSER_1_ENV \
--location=COMPOSER_1_LOCATION \
upgrade_check \
-- --ignore VersionCheckRule --ignore LoggingConfigurationRule \
--ignore PodTemplateFileRule --ignore SendGridEmailerMovedRule

Replace:

  • COMPOSER_1_ENV with the name of your Airflow 1.10.15 environment.
  • COMPOSER_1_LOCATION with the region where the environment is located.

Upon execution, the upgrade check script produces a report detailing the status and any potential compatibility issues present in the existing environment. As an illustration, the output of the script may resemble the following:

After running the script, you’ll receive a comprehensive list of recommendations, outlining the packages that need installation, the operators/hooks that require modification, and other relevant information. As an example, the recommendations may appear in the following format:

❗️During this phase, it is crucial to gain a comprehensive understanding of your project’s structure. If your project relies on DAG templates sourced from another project, it’s important to note that the script may not automatically detect them. Similarly, if your project contains any sub-directories, the script may not be able to identify them.

This means that it’s up to you to manually check these areas and ensure that any necessary updates are made before proceeding with the migration process.

Step 3: Make sure that your DAGs are ready for Airflow 2

Now that you’ve considered your project’s structure, let’s revisit the Composer image and examine the included PyPI packages. These packages, particularly “apache-airflow-backport-providers-google”, can prove to be invaluable time-savers throughout the migration process.

To gain a deeper understanding of what’s included in the “apache-airflow-backport-providers-google” package, be sure to check out the following link.

The importance of the backport packages

One key distinction between Airflow 1 and Airflow 2 is the manner in which external libraries and packages are imported. Airflow 1 relies on the use of “contrib” imports, while Airflow 2 employs “providers” imports. Here is an example:

from airflow.contrib.operators.bigquery_operator import BigQueryOperator
vs
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

The introduction of backported provider packages offers users the ability to migrate their DAGs to the new providers package incrementally. By gradually converting to the new operators, sensors, and hooks, users can seamlessly transition their environments to Airflow 2.0. One of the advantages of the providers backport packages is the ability to simultaneously utilize both old and new classes, even within the same DAG. This phased and adaptable approach ensures a steady and methodical migration process that reduces the chances of unexpected issues and allows for adjustments to be made along the way.

The “apache-airflow-backport-providers-google” package is an all-in-one solution that simplifies the migration process by eliminating the need to install multiple additional packages for various operators, hooks, and sensors. This streamlines the migration process, requiring only a few adjustments to your Airflow 1 imports and operators before redeploying your DAGs to Composer 1.

❗️Although the “apache-airflow-backport-providers-google” package is a comprehensive tool, it may not include every operator required for your specific project. In such cases, additional packages may need to be installed in your Composer 1 environment, such as:

  • apache-airflow-backport-providers-postgres
  • apache-airflow-backport-providers-sftp
  • apache-airflow-backport-providers-ssh
  • apache-airflow-backport-providers-salesforce

As you progress, it will be important to reinstall these packages without the backports. For example, you will need to install packages like “apache-airflow-providers-salesforce” to replace their backported versions. This ensures that your project functions smoothly without any compatibility issues.

Deprecated operators

While the upgrade check script can provide helpful recommendations for updating your DAGs, it’s essential to take the time to verify that the suggested operators are not deprecated. Failing to do so could lead to unexpected problems and hinder the migration process.

For instance, while the script may suggest using the BigQueryExecuteQueryOperator in place of the BigQueryOperator, a closer examination of the Airflow documentation reveals that the BigQueryExecuteQueryOperator is actually deprecated.

This operator [BigQueryExecuteQueryOperator] is deprecated. Please use airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator instead.

Given that both the BigQueryOperator and BigQueryInsertJobOperator serve the same purpose, which is to execute a BigQuery job, it’s advisable to use the BigQueryInsertJobOperator instead of the deprecated BigQueryExecuteQueryOperator.

Let’s see the difference:

#Airflow 1
from airflow.contrib.operators.bigquery_operator import BigQueryOperator

task1 = BigQueryOperator(
task_id='execute_query1',
sql=query1,
destination_dataset_table='your_project.your_dataset.destination_table1',
write_disposition='WRITE_TRUNCATE',
create_disposition='CREATE_IF_NEEDED',
allow_large_results=True,
flatten_results=False,
use_legacy_sql=False,
bigquery_conn_id='your_bigquery_connection_id',
dag=dag
)
#Airflow 2
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

task1 = BigQueryInsertJobOperator(
task_id='execute_query1',
configuration={
'query': {
'query': query1,
'destinationTable': {
'projectId': 'your_project',
'datasetId': 'your_dataset',
'tableId': 'destination_table1'
},
'writeDisposition': 'WRITE_TRUNCATE',
'createDisposition': 'CREATE_IF_NEEDED',
'allowLargeResults': True,
'flattenResults': False,
'useLegacySql': False
}
},
gcp_conn_id='your_gcp_connection_id',
dag=dag
)

Other Operators

In many cases, the parameters used for the operators will remain unchanged during the migration process. The main adjustments required are typically limited to updating the imports and changing the operator’s name.

For example, when migrating from the BigQueryToCloudStorageOperator to the BigQueryToGCSOperator, or from the FileToGoogleCloudStorageOperator to the LocalFilesystemToGCSOperator, or from the PostgresToGoogleCloudStorageOperator to the PostgresToGCSOperator, the parameters used for these operators will likely stay the same. This consistency can simplify the migration process and reduce the probability of errors.

#Airflow 1
from airflow.contrib.operators.bigquery_to_gcs import BigQueryToCloudStorageOperator

task1 = BigQueryToCloudStorageOperator(
task_id='export_data',
source_project_dataset_table='your_project.your_dataset.your_table',
export_format='CSV',
bigquery_conn_id='your_bigquery_connection_id',
google_cloud_storage_conn_id='your_gcs_connection_id',
dag=dag
)
#Airflow 2
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import BigQueryToGCSOperator

task1 = BigQueryToGCSOperator(
task_id='export_data',
source_project_dataset_table='your_project.your_dataset.your_table',
export_file_format='csv',
gcp_conn_id='your_bigquery_connection_id',
google_cloud_storage_conn_id='your_gcs_connection_id',
dag=dag
)

PythonOperator, BashOperator

These operators will not cause any issues during the migration process, and they will function perfectly in Airflow 2/Composer 2.

#Airflow 1
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator

However, it is advisable to update them after the migration, specifically in Composer 2, to ensure compatibility and take advantage of any improvements.

#Airflow 2
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator

Hooks usage

The usage of Hooks does not significantly differ between Airflow 1 and Airflow 2. Let’s see the examples:

#Airflow 1

from airflow.contrib.hooks.ssh_hook import SSHHook
<...>
ssh_hook = SSHHook(ssh_conn_id='my_ssh_conn')
<...>

from airflow.contrib.hooks import BigQueryHook
<...>
bigquery_hook = BigQueryHook(bigquery_conn_id='my_bigquery_conn')
<...>
#Airflow 2

from airflow.providers.ssh.hooks.ssh import SSHHook
<...>
ssh_hook = SSHHook(ssh_conn_id='my_ssh_conn')
<...>

from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
<...>
bigquery_hook = BigQueryHook(gcp_conn_id='my_gcp_conn')
<...>

Airflow database clean up DAG

It’s worth noting that the upgrade check script may not detect any errors in the airflow_db_cleanup DAG. As a result, it’s essential to double-check for differences between the Airflow 1 and Airflow 2 versions of the DAG and make any necessary adjustments before deploying the Airflow 2 DAG to Composer 1.

‘Legacy UI is deprecated by default’ issue

You no longer have to explicitly set the RBAC UI in the configuration because it is now the default UI. If you were using a non-RBAC UI before, you need to switch to the new RBAC UI and create users in order to access Airflow’s webserver. Ensure that you include a similar configuration in your Composer 1 environment.

The Airflow URL path will be updated from /admin to /home.

Check the results

Once you’ve successfully updated all your operators, hooks, sensors, and made other changes, it’s time to run the upgrade check script again to verify the results. If the script reports that everything is in order and “World is beautiful”, you can proceed to the next stage of the migration.

Step 4: Pause DAGs in your Cloud Composer 1 environment

Before capturing a snapshot of your Composer 1 environment, it’s important to pause all DAGs to prevent any duplicate runs during the migration process. This can be done manually using the Airflow UI or by running the “composer_dags” script.

python3 composer_dags.py --environment COMPOSER_1_ENV \
--project PROJECT_ID \
--location COMPOSER_1_LOCATION \
--operation pause

Replace:

  • COMPOSER_1_ENV with the name of your Cloud Composer 1 environment.
  • PROJECT_ID with the Project ID.
  • COMPOSER_1_LOCATION with the region where the environment is located.

Step 5: Save the snapshot of your Cloud Composer 1 environment

Once your DAGs have been paused, the next step is to save a snapshot of your Composer 1 environment. This can be accomplished in two ways: either through the Composer console or by executing gcloud commands.

It’s worth highlighting that any changes or DAG runs that occur in your Composer 1 environment after saving the snapshot will not be reflected when you load the snapshot in the future. Additionally, it’s important to note that while you can save snapshots in Composer 1, you cannot load them in the same environment. Snapshots can only be loaded in Composer 2 environments.

Step 6: Create a Cloud Composer 2 environment

Now that you’ve saved your snapshot, it’s time to create a Cloud Composer 2 environment. You can do this through the console, or by using a tool like Terraform. Be sure to review the available optional parameters and configurations to ensure that your Composer 2 environment is set up to meet your specific needs.

❗️It’s important to note that the Cloud Composer Service Agent account must have the Cloud Composer v2 API Service Agent Extension role for your Cloud Composer 2 environment to function properly.

Additionally, it’s worth highlighting some of the significant differences between Composer 1 and Composer 2. With Composer 2, you’ll benefit from the Autopilot mode Google Kubernetes Engine cluster, which greatly enhance the performance and reliability of your Airflow DAGs. From personal experience, you can expect your DAGs to run significantly faster in Cloud Composer 2 compared to Cloud Composer 1.

Step 7: Load the snapshot to your Cloud Composer 2 environment

Once your Composer 2 environment is up and running, you can load your previously saved snapshot to bring your DAGs and configurations into the new environment. This can be accomplished through the console or by executing a gcloud command.

gcloud beta composer environments snapshots load \
COMPOSER_2_ENV \
--location COMPOSER_2_LOCATION \
--snapshot-path "SNAPSHOT_PATH"

Replace:

  • COMPOSER_2_ENV with the name of your Cloud Composer 2 environment.
  • COMPOSER_2_LOCATION with the region where the Cloud Composer 2 environment is located.
  • SNAPSHOT_PATH with the URI of your Cloud Composer 1 environment's bucket, followed by the path to the snapshot. For example, gs://us-central1-example-916807e1-bucket/snapshots/example-project_us-central1_example-environment_2022-01-05T18-59-00.

It’s important to note that during the snapshot loading process, Composer transfers configuration overrides, environment variables, and PyPI packages from Cloud Composer 1 to Cloud Composer 2 without any modifications or adjustments for compatibility.

If custom PyPI packages cause dependency conflicts, you can opt to skip their installation during the snapshot loading process. You can install the necessary providers packages directly on your Composer 2 environment and then skip the installation of PyPI packages from your snapshot. For example, if you previously had “apache-airflow-backport-providers-salesforce”, install “apache-airflow-providers-salesforce”. This approach helps avoid any potential conflicts with PyPI packages and ensures a smoother migration process overall.

Step 8: Unpause DAGs in the Cloud Composer 2 environment

Once the DAGs have been successfully loaded to your Composer 2 environment and are visible in the Airflow UI, it’s time to unpause them so they can resume their normal operation. This can be done manually through the Airflow UI or by running the same “composer_dags” gcloud command you used in Step 4.

  python3 composer_dags.py --environment COMPOSER_2_ENV \
--project PROJECT_ID \
--location COMPOSER_2_LOCATION \
--operation unpause

Replace:

  • COMPOSER_2_ENV with the name of your Cloud Composer 2 environment.
  • PROJECT_ID with the Project ID.
  • COMPOSER_2_LOCATION with the region where the environment is located.

Step 9: Check for DAG errors

After unpausing your DAGs, it’s crucial to closely monitor their runs in the Composer 2 environment to ensure that they are successful. Check the destination of any processed data, such as querying it in BigQuery or examining the files in the bucket, to verify that the migration was successful.

In the unlikely event that any issues arise during the DAG runs, you have the option to delete your Cloud Composer 2 environment and resume operations in Cloud Composer 1. However, it’s typically recommended to try to troubleshoot and resolve any issues in the Composer 2 environment, as doing so can help you take full advantage of the platform’s enhanced capabilities and avoid the need to revert back to Composer 1.

Airflow db upgrade

Following the migration, you may encounter a similar notification in Composer 2. If so, simply click on “Upgrading” and adhere to the provided instructions for dropping the table.

Step 10: Monitor your Cloud Composer 2 environment

Once you’ve confirmed that your Composer 2 environment is functioning smoothly, you can begin considering deleting your Composer 1 environment. This decision should only be made after a thorough evaluation of your system’s performance and a sufficient period of time has passed to ensure that no unforeseen issues arise.

By deleting your Cloud Composer 1 environment, you can free up resources and simplify the management of your system. However, it’s important to proceed with caution and make sure that you have a solid backup plan in place in case any issues arise in the future.

Congratulations 👏

If you have any questions or would like to share your experiences about migration from Cloud Composer 1 to Cloud Composer 2, feel free to reach out to me on LinkedIn. I’d love to hear your story!

--

--

jana_om
Google Cloud - Community

Currently obsessed with Matcha lattes and GCP data engineering projects – because L-theanine and data make life thrilling.