Unlocking on-premises data with AWS DMS
https://github.com/aynsof/aws-dms-example
A problem that a lot of people are finding is that they want to use cloud services, but their data is locked away behind layers of firewalls. As people move more of their applications to the cloud, it’s becoming harder and harder to find app candidates that don’t rely on internal databases.
In many cases it’s an enormous task to migrate the whole database. Aside from the complexity of moving such a large amount of data to the cloud, there are labyrinthine pre-existing pipelines processing and transforming and sending this data to the databases. Updating these pipelines to point to a new cloud-based database would be a colossal undertaking. There is also the question of how unstructured the data is, and how expensive it might be to migrate it without any kind of restructuring.
Application developers don’t want to wait for that to happen. They’re loving the new-found freedom and agility of owning their whole stack, from infrastructure to front-end, and the business owners are loving the massively reduced time-to-production.
There needs to be a way to safely expose the data within these databases to cloud-based applications.
Database Migration Service
Enter AWS Database Migration Service. Using the DMS Terraform interface developers can retrieve selected schemas from internal Oracle databases and host them in a read-only Postgres database in the cloud.
This Postgres replica can be hosted in a Virtual Private Cloud (VPC) with absolutely no internet-facing connections except:
- Highly-available VPN links to an on-premises datacentre/DR site
- VPC peering connection to an admin VPC
- VPC peering connections to other applications
The VPC peer connections aren’t transitive, so the risk of exposure to the wider internet is massively reduced.
DMS Architecture

This is the high-level view of the DMS infrastructure. In this example there are two major on-prem Oracle databases. They are linked by highly-available virtual private network (VPN) connections to the cloud environment. These links provide a secure tunnel through which DMS can communicate with the on-premises databases. There are dual VPN connections from each datacentre to the DMS environment — four connections in total (simplified in the above image to reduce noise).
The DMS infrastructure consists of:
- A virtual private cloud (VPC), defining all the networking and security groups.
- Two DMS Source Endpoints pointing at each of the Oracle databases. These endpoints define the address, port, credentials, and database type for the source databases.
- A DMS Replication Instance, which is used by DMS to house the data while it is being translated from Oracle to Postgres.
- A DMS Target Endpoint, which points at a Postgres database that eventually stores and serves the data.
- Multiple DMS Replication Tasks, which define the schemas to be migrated. Each of these links together the Source Endpoint, the Replication Instance, and the Target Endpoint.
For applications that want to use the data in the target Postgres database, they need to VPC peer with the DMS VPC. This is a manual process that involves performing a handshake, creating route tables in each VPC, and enabling cross-VPC DNS.
Terraform Example
The example code is available here: https://github.com/aynsof/aws-dms-example
This will create the DMS Endpoints, DMS Replication Instance, and a DMS Task. It’s up to you to point it at a source database — it could be through a VPN to your on-premises infrastructure, or you could create a test RDS instance just to prove that the system works.
Things to Note
VPN connections aren’t defined in Terraform: a terraform destroy would blow away their Elastic IPs and the on-prem VPN connections would stop working.
Being a managed service, DMS has a fairly high level of opacity. There’s not a lot of insight into its inner mechanics.
Results
With that said, however, DMS is a solid option for unlocking the data inside organisations’ internal databases. With on-prem environments typically being fairly slow moving and highly restricted, this solution allows developers to reach this data from AWS, enabling new systems to be prototyped and deployed far more rapidly.
