Building a flexible platform for optimal use of LLMs

Published in

Intuit Engineering

6 min readMar 4, 2024

This blog is co-authored by Oren Dar, staff data scientist, and Nitzan Gado, staff data scientist, at Intuit

With the ability to process natural language and produce content across a wide range of topics, generative artificial intelligence (GenAI) is poised to transform entire industries and fields of study. GenAI makes it possible to personalize and optimize individual user experiences. It facilitates work across a wide range of tasks, making individuals and organizations far more productive.

To realize the promise of GenAI at scale and with speed, Intuit has developed a GenAI operating system (GenOS), a system that enables new, game-changing GenAI-powered experiences. It integrates commercial and open-source large language models (LLMs) with our own custom-trained financial LLMs that specialize in solving tax, accounting, marketing, cash flow, and personal finance challenges.

Our goal is to create seamless access to multiple LLMs for Intuit developers, so that they can be matched appropriately to individual use cases, compared with one another and customized as efficiently and effectively as possible. All in service to creating awesome experiences for Intuit’s 100 million consumer and small business customers with TurboTax, Credit Karma, QuickBooks and Mailchimp.

There is no one-size-fits-all LLM solution

Choosing, fine-tuning or building the right LLM is a fundamental part of developing generative AI-powered applications. To address use cases, we carefully evaluate the pain points where off-the-shelf models would perform well and where investing in a custom LLM might be a better option.

A given LLM is only as good as the information on which it is trained. As a result, every model has its strengths and weaknesses. As a general rule, the more information a model has available, the better and more accurate its output — and the more resources it requires to develop, maintain and use. Choosing models that provide answers accurately and efficiently while remaining useful for the widest possible set of applications is a key challenge.

The enormity of that challenge means there are no one-size-fits-all solutions. Across the globe, the number of LLMs in development and production continues to rise as organizations seek to address an enormous variety of commercial, academic, governmental and social needs.

Intuit’s LLM-agnostic approach is designed to maintain flexibility in this rapidly moving space. Specifically, GenOS provides Intuit technologists access to a curated set of third-party commercial models and open-source models, in addition to proprietary, custom-trained financial LLMs. This allows engineers to locate and deploy LLMs that best match their specific use cases — or tailor an existing LLM to match their use case, if necessary.

Offering that level of flexibility in a developer-friendly package comes with steep challenges that Intuit has worked diligently to overcome. Given that any enterprise company that’s innovating with GenAI at scale in a highly regulated industry like fintech will face similar challenges, we’re sharing our firsthand experiences in this blog on how we solved the problem of providing this flexibility. We hope you find it helpful on your journey with GenAI.

Challenge 1: Building and maintaining a catalog of fine-tuned LLMs

Before we could offer Intuit developers a wide variety of LLMs to choose from, we first had to develop, maintain and evolve a catalog of LLMs suitable for the types of use cases that arise across our businesses. The rapid evolution of models means this catalog must be a dynamic entity.

This means expanding and refining our LLM catalog to include the most useful, highest quality, and most recently updated third party LLMs. It also means carefully considering the right base model on which to build new offerings, and updating them accordingly so they continue to work properly with existing tools.

We revisit our base model LLM every few weeks to ensure that we’re still using the best model for Intuit’s use cases and needs — this update process can include fine-tuning, alignment via RLHF (reinforcement learning from human feedback), integration with RAG (retrieval augmented generation) and with tool usage capabilities, and more. All of these processes take into account parameters linked to output quality and data security.

Our strategy is to provide Intuit developers with a comprehensive selection of third party and open source LLMs, as well as foundational Intuit LLMs (FILMs) developed for product- or domain-specific use.

Challenge 2: Matching the right LLM to the right use case

While having a variety of LLMs to choose from is important, creating this catalog is not enough in and of itself. We need a way to guide developers to the right solutions efficiently and effectively. As part of that process, we also need a way to determine when it would be appropriate to develop a domain-specific LLM that fits use cases for which the best possible model doesn’t already exist.

Experimenting with multiple LLMs to see which one works best can be resource, time and cost intensive. Our proprietary GenOS solves this challenge by offering developers a standard interface through which to interact with any of the LLMs in the ecosystem.

We develop state-of-the-art benchmarking and evaluation capabilities that blend manual evaluation by our experts with automatic evaluation, either against a predefined golden set or given a list of prompts and criteria [e.g., industry standard multi-task inference (MTI) benchmark]. We have an internal leaderboard display that provides developers with an up-to-date view of new evaluations or benchmarks, and new models. And, we have dashboards that help teams to manage resource costs within budget parameters when moving to production with a given LLM.

Challenge 3: Creating a paved path for fine-tuning LLMs

The right solution for a specific use case may not exist even in the best curated catalog of LLMs. Fine-tuning an existing LLM is a relatively efficient way to create something new without having to build a new LLM from scratch. Fine-tuning still requires resources, however. And how you incorporate new information into an existing model matters.

Incorporating information tailored to a particular industry, organization or customer can improve the speed and accuracy of a GenAI-powered application. However, incorporating that information comes with tradeoffs. The more use cases an LLM applies to, the larger it tends to get, requiring more resources and more time to get it to produce a solution. At the same time, every new model we produce has the potential to solve other use cases down the road.

Intuit foundational Intuit LLMs (FILMs) provide a modular solution with clear benefits: a relatively smaller domain-specific model that is faster and cheaper to experiment with and deploy at scale, when compared with a third party offerings, and that can be fine tuned to specialize further.

To maintain scalability, GenOS willuse FILMs as the basis for fine tuning. As we develop and train these models, we look carefully at the types of data on which we train them so they can support as many use cases as possible. Common starting points involve up-to-date domain-specific or corporate information in the public domain, such as marketing brochures or the federal tax code.

We also have to keep in mind that any new LLMs we generate through fine-tuning will require a certain amount of maintenance and updating. As domain-specific data changes (for example, when the tax code changes), we’ll need to incorporate these changes in our LLMs. Updates may also be required to encompass advancements in data processing. Altogether, this will result in greater efficiencies when maintaining our expanding catalog of LLMs.

Challenge 4: Supporting a responsible development environment

To address the challenges above, we provide our developers with a single, coherent system with our GenOS, with the right LLMs for the job. Here at Intuit, we safeguard customer data and protect privacy using industry-leading technology and practices, and adhere to responsible AI principles that guide how our company operates and scales our AI-driven expert platform to benefit our consumer and small business customers.

Unlocking development velocity

The goal of Intuit’s proprietary GenOS is ultimately to support the development and deployment of GenAI-powered applications, with speed at scale. A system that can help developers identify, employ, measure and fine tune LLMs to solve for individual use cases is pivotal to achieving that goal. By addressing key challenges that can eat up development time and resources, our GenOS is unlocking GenAI development velocity.

For additional insight into our GenAI journey, see our Stack Overflow Best Practices for Building LLMs blog. And, to learn more about Intuit’s technology innovation, visit here.