Modernizing the Data Analytics Platform at Financial Engines
For a few years now, Financial Engines product engineering teams have been on the journey of migrating their applications and infrastructure from the on-premises data center to the AWS Cloud. Earlier this year, the Data Engineering team at Financial Engines successfully migrated the Enterprise Data Warehouse from on-premises instance of Netezza to AWS Redshift in the cloud along with modernizing the entire guts. It marks the first significant milestone in our journey of modernizing the Data Analytics Platform, as we execute towards the company’s strategic goal of servicing 10 million clients.
All modernization efforts are a time-bound result of technical choices made, project processes adopted and the team’s collaborative efforts. This article describes the modernization efforts at the technical solution architecture level and briefly highlights on other equally important aspects.
At Financial Engines, the Enterprise Data Warehouse serves as a centralized solution to analyze data from a host of operational systems. As the data backbone of the company, it
- powers over 200+ Tableau dashboards for our geographically distributed decision support teams enabling daily business decisions,
- fuels the critical campaign performance metrics for Distribution and Marketing teams in Salesforce CRM,
- supports call activity tracking for the National Advisor Center team, enables financial performance for the FP&A team, and finally,
- powers the product recommendations for our Insights platform that reaches over 1 million clients.
Challenges — Before
Our legacy data warehouse, Netezza (IBM Pure Data), was a classic traditional data warehousing system implementation, see diagram in Figure 2. We had nightly batch data ingests from the data sources above, using a combination of ELT code in SSIS and Informatica. Tidal was our workflow scheduler, and most transformations were applied within the target as SQL stored procedures.
Over the years, the system had grown to host daily data for 10 million clients and financial data of approximately 12TB in size. More importantly it was a treasure trove for data scientists to define market segmentation and execute helpful studies. We were already experiencing the challenges with the data limits and had to physically shard our data across two Netezza instances. The primary instance was set aside to serve the frequently accessed queries with daily SLA, whereas the infrequently accessed, SOX compliance related, workloads with historical snapshots were moved to a secondary instance. The arrangement was inconvenient with end customers having to connect to two instances at times, and for the team to maintain separate workloads.
At a strategic level, there was a need to scale our data analytics platform to meet the company’s vision for growth. The Technology teams were well on their journey of migrating their workloads from the on-premises data center to the AWS Cloud. We reached the tipping point with the end of life support on our Netezza appliance and contract renewal which would have dramatically increased our TCO costs.
Modernization — The Start
We began our modernization journey in early 2017 to meet both the short-term and the strategic business objectives.
With the engineering decision to adopt AWS Cloud, the team began its efforts by training themselves on AWS analytics services, with a few of our team members even completing the AWS certification!
To efficiently communicate and collaborate across the organization, we defined a roadmap with quarterly and monthly deliverables, and began tracking details in epics and user stories in JIRA.
Most importantly, we established the guardrails below and engaged the architecture team early and on a regular cadence to bake in the security and compliance concerns. We adopted the following guardrails:
- Prioritize on delivering value to customers first, all others concerns next. From the end user perspective — don’t break things!
- “Reinvent” over “Lift-and-Shift”. We wanted to future-proof our architecture for evolving needs and pull off big changes compared to straight lift and shift.
- Strive for AWS managed services over custom solutions. We prioritized AWS Cloud Native solutions over commercial and custom implementations. For example, we chose Redshift over commercial alternatives and chose RDS with Postgres over self-managed Postgres.
- Adopt infrastructure-as-a-code for automated infrastructure deployment across dev, test and production accounts.
- Adopt AWS best practices and industry standards such as, design for failure and nothing fails, think parallel vs. sequential execution and loosely coupled systems.
Modernization — After
The legacy technology stack in Figure 2 was completely reinvented with the modern technology stack in Figure 3.
- First, the decision to replace Netezza with Redshift. Beyond the seemingly superficial switch from row-based to columnar technology, it meant re-writing all synonyms and stored procedures to flatter file-based transformations. (We did evaluate EMR and Snowflake and were influenced by the article here.)
- Next, the team evaluated several ETL tools both commercial (SSIS/Talend/Matillion) and open source (Kettle/AWS Data Pipeline/Data duct), before deciding to take on developing an ETL framework on our own using Python. At this time, AWS Glue was not generally available.
- We selected Python 3.6 with PyCharm community edition as IDE.
- The decision to go with Python data pipeline required the entire rewrite of ETL code from SSIS. On a positive side, it provided us the opportunity to address the deepest technical debt that we had punted over the years.
- It also meant abandoning the enterprise workflow scheduler Tidal with a switch to open source Apache Airflow. We decided to go with Airflow after evaluating its capabilities against Luigi and Jenkins.
We adopted several AWS Cloud best practices.
- Security was the paramount pillar. The Data Analytics architecture was developed to avoid use of PII a.k.a. personally identifiable information, though has sensitive information. We implemented security for data at rest within all our data storage layers — S3, Redshift and RDS and incorporated TLS for data in motion. IAM policies were defined with principle of least of privileges. And the entire infrastructure was developed within secured VPC. Security in EC2 was baked with AMIs from our DevSecOps team, with root drive encrypted. Passwords were stored using the PMP tool and limited to the DevOps and DB Administrator. Finally, we secured the access for individuals to cloud using the network layer and integrated authentication using Okta.
- Decoupled storage from compute. We adopted the standard best practice and used S3 as the storage interchange. This helped us move away from developing point-to-point solutions and develop data producers to push the data in S3 (eventually Data Lake!) before being ingested in Redshift.
- Keep it simple. We decided to go with us-west-1 since we had the AWS Direct Connect. (Although we had preference for us-west-2 region as most new AWS services arrive earlier in that region.)
- Infrastructure-as-code. We decided to adopt the DevOps best practice of implementing infrastructure-as-code using Terraform and Ansible. More on DevOps practices and monitoring tools in another blog.
- Data migration from on-premises data center to the AWS Cloud was done using custom tools and utilities that also ensured data consistency with the data in the legacy system. We did explore the use of AWS Data Migration Service but decided against it due to lack of Netezza support and additional complexities with configuring the tool.
The table below summarizes the benefits we found with the switch to the new tech stack.
Comparison — Before and After
The milestone above and the adoption of AWS Cloud technologies paves the way for the future business use cases. At the time of writing this blog, we have successfully deployed Data Lake in AWS, with a few of our teams on-boarded. Looking ahead, we expect to continue our journey with the development of event processing pipelines for near real-time analytics use cases and incorporating AWS Glue and Redshift Spectrum in the platform. More on these efforts in the blogs ahead.