Case Study: Intuit’s 4 Learnings in Data Science Cloud Migration

The Data Standard
5 min readMay 28, 2020

--

Intuit’s Nhung Ho

The Data Standard is a new community for data scientists. In addition to our monthly digital live events, we feature community members and data science leaders in our content. Visit our site to learn more and become a member.

Developing personal finance products for Intuit “won’t be solved with a lot of data and a laptop,” says Nhung Ho, director of data science at Intuit, which owns TurboTax. That’s why, six years ago, Intuit decided to move into the cloud and begin its AI-driven transformation.

Ho recently spoke with The Data Standard on how she successfully led this migration. This is part two of the interview; read how she became a data scientist here. Here are her learnings from the experience:

1. Cloud migration doesn’t happen overnight.

I have to admit I didn’t fully understand what was needed to move to the cloud. I envisioned snapping our fingers and it was done! It was actually a really long journey: You’re moving from working on your laptop or an on-premise data center to completely changing your workflow in the cloud. And you don’t just move your data but your applications, your services and you need to make sure the cloud environment you’re moving to is completely secure as well. I learned a lot of lessons as we migrated to the cloud.

2. Choosing a cloud provider isn’t “one cloud fits all.”

Without a doubt, most people want to know how to choose from all the cloud service providers. The cloud is now mainstream and the competition is fierce. A few years ago there were only three major providers but now there are six.

The AI services delivered by each provider is a distinguishing factor for data scientists — and new ones come out every month. I didn’t realize what a difference these capabilities can make in terms of the ability to launch new products until I myself went through it.

About two-and-a-half years ago, we wanted to put a chatbot into our Turbo Tax product that would pop up to help customers with a problem before they needed to call in for help. We needed to have a conversational agent capability and natural language understanding service that was scalable, extensible and highly accurate.

We’re a financial services company so we didn’t have this expertise in house. We faced a “build or buy” decision. We looked at all the cloud providers and found all had battle-tested options in this area. By taking this capability from our cloud provider, we were able to leapfrog forward a few years in just a few months, provide a chat agent within Turbo Tax and build on top of it.

Another important lesson? It doesn’t have to be “one cloud fits all.” Regardless of which provider you choose, take a look at the AI capabilities being offered by other providers because “buying” these road-tested offerings can really expand your data science team beyond what it can do now.

3. If workflow doesn’t change, you’re not leveraging the power of the cloud.

Before the cloud migration, I had thought I was at the forefront of working in a new way: All my work was done in a virtual machine and I fully specified my dev environment to where if my laptop crashed, I could bring it up in the same day. So I thought I would just be dropping my laptop into the cloud and go on with my day.

I soon realized that if you start working within a laptop environment, you’re not leveraging the full power of the cloud. If the instance type you’re working on runs out of memory, you should be able to pull up a couple more nodes and distribute your workflow across them — and that’s not something you can do on a laptop. No matter if you are already using virtual machines on your laptop, you cannot do that autoscaling without moving to a cloud-native tech stack.

The difference was apparent to me when we moved to containers. Not only did it allow us to build our code and write systems more cleanly, it enabled better collaboration as well: I could send a container file to a coworker and they could specify their dev environments on their machine and run the same code.

This new environment delivered incredible scale. If you run out of compute and you need more memory and you’re working on a cloud native stack, you can spawn your job across 100 nodes and get that done in the same time via Kubernetes without changing anything about your configuration. That’s where the real power lies in the cloud.

4. If your products don’t change, you’re not leveraging the power of the cloud.

The massive scale of the cloud allowed us to do something we didn’t think was possible: Cash flow forecasting in our QuickBooks product. The challenge lay in the fact that every small business is unique, like every human is unique. We couldn’t just aggregate data from all our customers, build out a time series and then apply it to all our customers. It would be worthless.

To do it right, we needed to be able to build an individual time-series at run-time for every single small business. And because we have 4 million customers, that meant we needed to build 4 million time-series algorithms based on 1 billion transactions and then ship them out in a timely manner.

The computational scale needed to do that is massive. One time a few years ago, I pulled in 100,000 time series and my laptop just keeled over and died! So when you think about 1 billion transactions for 4 million customers, the scale is unimaginable.

This is when we saw the potential for “distributed training” in the cloud. Rather than using the cloud for data computations in preparation for machine learning, it is critical for data scientists to consider how to distribute your training jobs so that you can build personalized models for each individual user — AI and a cloud native stack makes that possible today. Our QuickBooks customers can now use these forecasts to make better business decisions.

Bottom line:

I have had some data scientists ask me, “Why do we care about all these cloud migration considerations?” The short answer is that I believe we are better data scientists if we understand the end-to-end flow of our work — not just the engineering aspect but how our results show up to the business end users.

I would highly recommend that today’s data scientists to think of themselves not as a person who builds algorithms and “throws them over the wall” but as a strategic leader who is involved in the data science process from start to finish.

By being involved in architecting the “big picture,” our cloud-powered data science allowed Intuit to deliver product features and solve customer problems that were thought to be unsolvable when I joined the company just six years ago.

--

--

The Data Standard

The Data Standard is a new place for data scientists to come together for the latest thinking and community in data, analytics and AI. datastandard.io