A database cloud transformation success story

Marco Lagalla
Storm Reply
Published in
7 min readApr 11, 2024

Introduction

Within IT infrastructure expenses, one of the key factors is product-related licenses. In the database world the on-premise, the extensive use of Oracle is certainly one of the most common scenarios one can witness during an assessment. This is often related to multiple factors like prior know-how about the technology or the availability of a DBA-side support system, and many databases use Oracle also where no use is made of specific engine features.

As part of a modernization project, Storm Reply worked with a customer, a leading global telecommunications company, to build a success story of a database migration and transformation at scale from Oracle to PostgreSQL on AWS.

Risks associated with the project

The application that underwent this modernization process was a mission-critical application for the customer, constituting the entire enterprise order management layer. Moreover, given the centrality within the business flows, one of the challenges posed by the business as a requirement was that the migration and transformation project should be independent of the release plan schedule of the application itself in any way.

The success criteria for the Business were identified as:

  • Elimination of licensing costs, through a move to engine-free;
  • Modernization of technological stack;
  • Enhanced availability and performance;
  • Reduction in TCO, by lowering operations-related costs where possible;
  • Reduction in storage costs, due to application logs;
  • Migration had to be decoupled from software evolutions;
  • The project had to implement mechanisms such that no application downtime would be required during go-live;
  • The migration had to provide confidence such that risks were minimized;
  • The project had to implement · rollback procedures to avoid any data loss.

Proposed solution

To be able to achieve all the goals set by the Business, Storm Reply has identified a target solution composed as follows:

  • Transformation of databases to a free-engine solution based on Postgres;
  • Technology modernization by moving from Oracle 11.2.0.1 to PostgreSQL 15.3;
  • Creation of the new infrastructure on AWS cloud, leveraging native solutions that can natively offer high reliability (redundancy) and increased performance (introduction of read replicas);
  • Through the use of only PaaS solutions (AWS RDS), the infrastructure management effort is essentially zeroed out, delegating it to the cloud provider;
  • 75% of Oracle database storage space is eliminated by spinning off application logging to a log management solution based on AWS OpenSearch (managed);
  • Data Model transformation and data migration are accomplished by leveraging cloud-native tools that automate the transformation and manage data migration.

AWS provides native tools specifically for database migrations, that allow to industrialise the migration process for both schema conversion and data migration.

With the Schema Conversion Tool (AWS SCT), it is possible to perform both a pre-migration assessment, to evaluate the degree and complexity of data model portability, and the actual conversion. AWS SCT supports homogeneous (same engine) and heterogeneous conversions (e.g., from Oracle to PostgreSQL), managing all database components (tables, views, procedures, functions, etc.).

Regarding data migration, the Database Migration Service (AWS DMS) allows leverage on an engine managed by AWS, which, through some migration tasks that can be configured directly in the AWS console, takes care of moving the data from the source database to the target database.

One of the winning features of the DMS is that it implements live synchronization between source and destination databases, applying logical replication of the data, thus minimizing the need for application downtime.

Migration approach

Given the large number of databases to be migrated and the super-challenging timeline of the project (6 months), it was decided to approach the entire project by leveraging automation.

1.1 Assessment

The initial phase of any migration is always the assessment, which in this case, in addition to evaluating the characteristics of the databases in question, also had to provide an assessment of portability to a new engine.

To do this, Storm Reply leveraged the AWS SCT to automate the creation of assessment reports, which were then evaluated together with the Development team and the infrastructure team to calculate the risk associated with the transformation.

Furthermore, since for historical reasons, the infrastructure on the on-premise was over-provisioned, and one of the main goals imposed by the business was to reduce the TCO, a right sizing analysis was a mandatory step.

When it comes to right-sizing databases, this results in a very time-consuming and complex operation, requiring many highly specialized Database Administrators and Solution Architects who can analyze metrics and usage patterns of a database to understand the actual resource utilization.

Given the number of databases and the timing, Storm Reply leveraged a proprietary solution capable of hooking into on-premise databases, extracting consumption metrics (Oracle AWR), and historicizing them on AWS OpenSearch for further analysis.

Using the metrics database thus constructed, through a set of ETL jobs (AWS Glue) it has been possible to analyze the actual resource consumption required by databases. Having built a large database of usage data, it was possible to conduct a right-sizing analysis not only over the entire historical period, which in the case of development environments or performance testing (done at well-defined times) could be misleading but over specific time-frames identified by the client. This allowed metrics for significant periods to be considered in the right-sizing calculations, ensuring that the database, once migrated, had the characteristics to withstand the expected workload.

The infrastructure used to create the assessment platform is depicted in the next diagram.

The output of the automatic metrics analysis is then cross-referenced with the characteristics of the different types of RDS instances offered by the cloud provider and leads to the production of an automatic right-sizing suggestion.

1.2 Transformation & Data Migration

The transformation of the data model, on databases with a 15-year life history was particularly complex. To reduce the time required, the AWS Schema Conversion Tool service was used, which can automatically perform the conversion of schemas, procedures, and functions from Oracle to Postgres.

The tool can perform the migration of objects autonomously, flagging any attention points or conversions not possible automatically.

This has made it possible to reduce the number of people on the database transformation team, greatly lightening and speeding up the process.

In terms of data migration, AWS DMS was used, which allowed for an initial load of the data (Full Load) and, through the Live Sync (CDC) function, took care of keeping the source databases aligned with the target databases.

1.3 Pre-rollout automated test environment

To allow autonomous tests for the development teams, as they could not block developments and the application release schedule, an automatic system was designed and set up that could duplicate environments.

Leveraging IaC (Terraform) to describe and implement the entire infrastructure, and Jenkins as the orchestration system, a new automation capable of replicating the entire AS-IS environment on-demand in AWS was made available to the development teams.

Specifically, by launching the automation, the development team was able to create a perfectly aligned parallel environment in minutes, including:

  • Application Servers on EC2
  • Balancing systems
  • Database (aligned from a snapshot created ad hoc during rollout)
  • Shared files systems

Once the environment was made available, the development team could run tests of different types in a perfectly aligned and isolated environment without fear of corrupting the database while testing procedures and functions converted from Oracle.

At the end of the tests, in order not to pay unnecessarily for test resources, again through automatism, the development team could delete the cloned environment. This type of operation was made available to the development team in a self-service manner, allowing them to refresh this parallel environment whenever necessary.

1.4 Rollback procedure and methodology

As mentioned earlier, the applications in question are very critical for the customer, so designing and implementing an effective rollback system was a mandated requirement from day one.

For the rollback procedure, we always used AWS DMS, which also allows moving data from the Cloud to On-Premise.

In particular, it was chosen to proceed as follows:

  1. At the point after the cut-over, when replication is complete, the main DMS that was feeding Postgres in the cloud from the on-premise Oracle is stopped.
  2. Before hooking the application to the new database, a new DMS task is instantiated, to replicate data from Postgres to the old On-Premise database.
  3. To make sure that the pre-go-live source of truth is not lost, a snapshot of the Oracle database is performed and restored to a shadow instance, with reduced hardware characteristics compared to production, for the sole purpose of ensuring service continuity in case of unresolvable issues.

As for application servers, to be able to minimize downtime in case of rollout, a parallel environment able to run on Postgres was released. At the same time, the old Oracle chain remains standing, managing the applicative switch via a balancer in AWS.

Achievements and results

Thanks to the migration methodology, the client could complete the migration. Specifically, the results achieved were:

  • 12 databases migrated in 6 months (65 TB data migrated);
  • 100% reduction of license costs, which led to a huge saving in running costs;
  • Improving the technology stack by moving to managed and next-generation systems;
  • Increased workload reliability and resilience by leveraging high-availability cloud-native systems;
  • Reduced storage costs and increased application observability through adopting the OpenSearch framework for logs management and analysis.

--

--