Intuit Engineering
Published in

Intuit Engineering

The Intuit Data Journey

Accelerating Development of Smart, Personalized Financial Products & Services

A Tipping Point in our Journey

Back in late 2017, Intuit’s evolution from a collection of internally developed and acquired flagship products to a suite of connected financial services was in full swing, resulting in the rapid growth of our customer base around the globe. At the same time, internal demand for access to high quality and real-time data was growing exponentially, pushing our legacy data infrastructure and platform operability to its limits. Refreshing our infrastructure was expensive and we had no compute elasticity to support data-heavy jobs, such as real-time and batch processing, feature extraction and ML model training. It used to take us several quarters to roll out a new ML model. We needed to hire specialized talent to maintain and operate proprietary infrastructure solutions. Our data was siloed, and we lacked reliable tools to assist with data discovery, lineage, and cleansing. Supporting the ever-growing data worker community of engineers, analysts and data scientists was not sustainable by a small data platform team at the speed at which the company needed to innovate.

A Refreshed Data Strategy

We refreshed our data strategy to center around the following key themes:

  • Transactional persistence infrastructure
  • Data catalog to discover and track data and lineage
  • Extensible data ingestion framework
  • Data pipeline orchestration and management
  • Real-time distributed stream processing
  • Data lake and lake-house infrastructure
  • Machine learning training and hosting infrastructure
  • Model lifecycle management tools and workflows
  • Feature engineering and feature store capabilities
  • Curated, global data model
  • Unified user experience and data portal
  • Data exploration, curation and enrichment

Outcomes and Benefits

As of late 2019, we successfully migrated our entire data and machine learning infrastructure to the cloud and modernized our core data technologies. We have seen 50 percent fewer operational issues since we cut over to the new platform. We are seeing 20X more model deployments, and the platform has helped decrease model deployment time by 99 percent. For our data analysts, a huge delighter has been the improvement in data freshness from multiple days down to an hour. We have significantly reduced the heavy lift while increasing confidence in how we perform FMEA (failure mode and effects analysis) tests, load tests and handle peak season traffic. We now have a unified real-time instrumentation and clickstream tracking infrastructure, making it much easier for internal consumers to find what they need in one place for all Intuit offerings. Across the company, we are now seeing a big focus on real-time stream processing, with hundreds of stream processors for ML and analytics, and a shift in how real-time code is deployed.

Migration and Maturation

The cloud journey for both transactional and analytical data and ML systems took two years to complete, with every team at Intuit involved throughout the transition, ranging from engineers in the data teams to business product teams, data scientists, analysts, product managers and program managers. It also took a very engaged and highly responsive cloud partner (Amazon Web Services) to quickly turn around product requests on security, data and pipeline migration. The approach focused on: 1) migrating on-premise data to the cloud, 2) rewriting producers and transactional systems to start ingesting into the cloud, 3) rewriting data processing pipelines in the cloud, and 4) operating two systems in parallel, with constant parity checking and validations before switching over completely.

Retrospective: What Worked Well

With any complex migration, there will be a need to maintain two completely independent and parallel data ecosystems until the last data consumer moves over to the new system. This introduces a potentially significant double-bubble cost for the duration of the migration, stressing all teams supporting the migration, as well as your budget. We had the highest level of support throughout the company to help fund the dual effort for a couple of years.

Retrospective: Lessons Learned

Our data cloud migration was on track after a few false starts, once we pulled together the data producers, data consumers and data platform teams into one mission-based team with a real-time dashboard on impediments, progress, and data parity across on-premise and cloud pipelines. Once in place, data-backed dashboards increased the focus and alignment on common outcomes.

Our Journey Continues

Almost every aspect of our legacy data ecosystem has now undergone some form of modernization, from our legacy transactional persistence stores to our legacy Hadoop grid, ingestion and processing technologies, clickstream tracking, ML capabilities, data marts, data catalog, real-time stream processing, feature engineering, and curation technologies. But the work continues.

--

--

Thoughts from Intuit on tech, data, and the culture that powers today's innovation.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mammad Zadeh

Mammad is currently leading the data platform organization at Intuit. Previously he was an executive at LinkedIn, Netflix, and Yahoo.