Migrating SCADA SPOFF Application to AWS Cloud: A Success Story

Published in

Storm Reply

7 min readJul 18, 2023

Abstract

This blogpost presents the remarkable achievement of Edison S.p.A., our valued customer, in migrating their SCADA SPOFF application from on-premise systems to the Amazon Web Services (AWS) cloud. The article delves into the challenges encountered along the way, the architectural considerations taken into account, the proposed solution, and ultimately, the successful outcomes attained.

The challenge

Our customer, Edison S.p.A., has undertaken an extensive and remarkable journey of application migration over the past few years. Among the many inspiring tales, one that stands out is the SCADA SPOFF project. SCADA SPOFF is an Extract, Transform, Load (ETL) application that plays a crucial role in ingesting and processing data originating from power plants and various measuring devices.

Edison recognized the need to bid farewell to their on-premise legacy systems and embrace the advantages of cloud computing. By doing so, they aimed to optimize costs, enhance overall performance and unlock the potential for future scalability. In addition, another important goal was to change the application paradigm in order to break down all application logics that were previously tightly intertwined with the Oracle database in the on-premises data centre and shifting them to an external layer utilizing the cloud serverless technology.

However, migrating the SCADA SPOFF application to the cloud required careful consideration of both business and technical requirements, along with a multitude of challenges.

- First and foremost, ensuring a high availability infrastructure was of utmost importance to guarantee uninterrupted business operations. Edison needed a robust infrastructure that could seamlessly handle server failures without compromising the continuity of their critical operations. This necessitated careful planning and implementation to minimize downtime and maintain a smooth user experience.

- Complying with regulatory and data sovereignty requirements was a critical consideration for the customer. They needed to ensure that all user data remained securely within the boundaries of Italy.

- The project had a strict requirement of limiting database downtime to a maximum of 15 minutes during the data migration phase.

- Furthermore, the migration process involved transferring a substantial amount of critical data, approximately 2.5TB, stored in an Oracle database. This required a well-defined and carefully orchestrated data migration plan, ensuring the integrity and consistency of the data throughout the transition.

- Alongside the migration, the opportunity to update the Oracle Database engine version from the legacy Oracle Database 12c used in their on-premise systems.

- Lastly, the definition of a robust backup strategy. Edison recognized the importance of protecting and securing their data, mitigating the risks associated with accidental deletion, data corruption, hardware failures, natural disasters, or any other unforeseen events.

Overall, the migration of the SCADA SPOFF application to the cloud required a meticulous approach, careful planning, and effective execution. Edison’s commitment to leveraging cloud computing benefits such as costs optimization and performance improvement could be met by addressing critical business and technical requirements, as well as overcoming numerous challenges along the way.

The architecture

When the challenges coming from the business and technological perspectives are impacting the on-premises IT world, that is when the power of cloud computing and, in particular, the leader of the market Amazon Web Services come into play.

The architecture designed relies on different AWS services as shown in the diagram below. AWS serverless components such as Amazon EventBridge, AWS Lambda, SQS queue and AWS Step Functions were leveraged to orchestrate data replication between different sources, consisting of an Oracle database, Amazon S3 buckets and an Amazon Redshift serverless.

Every five minutes the EventBridge executed the Lambda function to retrieve data from DynamoDB table which contains the Step Functions’ steps for the different data replication paths to be scheduled and then executed. A REST API Gateway was deployed to let the users start the multiple Step Functions via API calls.

The different allowed paths of Step Functions are the following:

1) The Orchestration Step Function allowed to replicate files mainly from on-premises systems and S3 and viceversa, JSON file extraction and write results to Oracle database or Redshift.

2) Reading from Oracle and writing results into the S3 bucket.

3) SQL query execution on two different databases in AWS or on-premise and therefore comparing the results. If the results were different, it would send notifications to the users with Amazon Simple Email Server (Amazon SES).

4) Reading from AWS or on-premise databases and writing the results into the Oracle database or Redshift.

5) Reading Excel files from S3 bucket and transforming them into multiple JSON files and writing the results into Oracle database or Redshift.

6) Parsing XML files in the S3 bucket and writing the results into Oracle database.

Moreover, multiple Amazon Elastic Compute Cloud (Amazon EC2) servers were deployed to

- host the Oracle 19c database in a HA configuration and to be controlled by an observer node placed in an autoscaling group

- serve CRON jobs and therefore to monitor and ingest data from power plants

- act as database gateways to connect the Oracle database to external Microsoft SQL Server databases

Another REST API Gateway was used to ingest traffic coming from physical devices into the Oracle database.

AWS Database Migration Service (DMS) was used to migrate the data from the on-premise database into the AWS database.

AWS Key Management Service (AWS KMS), AWS Secrets Manager and Identity and Access Management (IAM) were considered to secure and manage access to resources and sensitive information within an AWS environment.

Finally, AWS services as X-Ray, Amazon CloudWatch and AWS CloudTrail were used to monitor the distributed cloud-based infrastructure and application thus tracking various metrics and indicators to ensure the performance, availability, and reliability of the infrastructure built.

Solution

The whole infrastructure has been deployed in the Virtual Private Cloud (VPC) to ensure the whole communication is in the private portion of the AWS Cloud.

The whole solution was designed and built in the AWS domain, in particular the most of the cloud infrastructure resides within the Milan region (eu-south-1) due to the compliance regulations to keep the data of the users within the italian boundaries. Instead, Redshift Serverless resides in Frankfurt region (eu-central-1) as it was not available in Milan region yet.

Thanks to the ETL nature of the application, the solution tries to modernize the on-premise infrastructure leveraging the serverless world. In this architecture, AWS serverless infrastructural components such as AWS Lambda and AWS Step functions were deployed to design and manage complex event-driven execution paths to simplify and evolve the application development.

To guarantee the 15 minutes of maximum downtime for the database, the Oracle database was chosen to be deployed on EC2 servers in Data Guard replication (a RDS managed database could not guarantee this maximum downtime requirement), spread across two Availability Zones and controlled by an Observer node placed on a third server within an autoscaling group to fail over the active database across the database nodes when needed. The two database nodes were kept synchronized thanks to the Oracle Data Guard which automatically transmits and applies redo changes from the primary database into the standby database.

The Oracle Database engine version was upgraded from the on-premise database engine version 12c to 19c on the target Oracle database on AWS.

In order to migrate users’ data resided in the on-premise Oracle Database, DMS was adopted to migrate large database tables from the source on-premise database to the target database on AWS and to keep data synchronized between the two databases during the data migration phase.

To prevent any unexpected event that may lead to a business loss, AWS Backup and a third-party software Rubrik were involved to roll-out backup plans to automate backup servers and the data within the databases at regular time intervals. Due to having the database hosted on EC2 servers, this complicated the data backup strategy and led us to choose Rubrik as external software. Rubrik was set up to generate of the full database backup every day and the logs backup every 30 minutes, so those can be used in case of the whole AWS region or its both availability zones become unavailable.

To protect the whole workload from any potential external attack an additional third-party security layer F5 was added.

Results

Our AWS cloud-based solution has delivered exceptional results across various key areas. Hereafter, there are remarkable outcomes achieved.

High Availability (HA) Infrastructure: we implemented a robust and highly available infrastructure. This HA setup ensures that our system remains operational even in case of hardware failures or other unforeseen events, significantly reducing the risk of downtime and providing uninterrupted access to the services.
Minimal downtime during data migration: by carefully planning and executing the migration phase, we guaranteed downtime of minutes (much lower than 15 minutes) during the migration of 2.5TB of critical data.
Oracle database version upgrade: we successfully upgraded the Oracle database from version 12c to 19c on AWS. This upgrade exploits the features and improvements offered by a higher Oracle’s database release.
Performance boost: by leveraging the capabilities of the cloud environment, the customer achieved a performance improvement by ten times. This significant enhancement ensures that the system can handle increased workloads efficiently and deliver a seamless user experience, even during peak usage periods.
Cost optimization: the cloud solution led the customer to optimize costs effectively. During the migration phase, a replatforming initiative was taken, thus leveraging serverless benefits. This approach not only enhanced the scalability and flexibility of the infrastructure, but also reduced the infrastructure overhead and eliminated the need for managing and provisioning dedicated servers thus resulting in substantial cost savings for the customer.

In conclusion, our cloud-based solution has delivered outstanding results across various dimensions thus to improve the capabilities, reliability, flexibility and the cost-effectiveness of the whole system. These accomplishments position us for continued success and growth, providing an efficient and scalable solution for our customers’ needs.