NeurIPS 2023: Intuit AI Research Presents Interactive Framework for Cost-Effective Fine-Tuning of Domain Specific Language Models

Published in

Intuit Engineering

4 min readDec 14, 2023

This blog is co-authored by Jiaxin Zhang, staff research scientist, Kamalika Das, manager, AI Research Program and Kumar Sricharan, VP and chief architect for AI at Intuit

Large language models (LLMs) like ChatGPT have exploded into the public consciousness over the past year, capturing the imaginations of both academia, industry and government, due to their impressive in-context learning (ICL) abilities.

However, as LLMs expand in scale towards a trillion parameters and beyond, they require specialized hardware, massive-scale training data, and extensive computational power, which are inaccessible for most product or research teams. Furthermore, the generalizability of LLMs is predominantly decided by the scope of the underlying pre-training data. While use cases for this technology continue to mount, LLMs currently do not perform well out-of-the-box in many real-world domains where specialized knowledge and accuracy are paramount for highly regulated industries, such as healthcare and finance.

As an alternative, small domain-specific language models (LMs) can be more favorable as they require less training data and are faster to compute, leading to faster development cycles and lower operating costs. Developing such models takes classic pre-training, as well as fine-tuning. To achieve comparable performance to generalized LLMs, they require high-quality, resource-intensive manual annotations on target domain data, which takes extensive human effort and expert knowledge. The key challenge at hand is how to effectively gather sufficient high-quality data given limited budgets on human annotation, which is a critical component in fine-tuning domain-specific LMs.

In partnership with Professor Bradley Malin and doctoral student Zhuohang Li at Vanderbilt University, the Intuit AI Research Program team is presenting a novel Interactive Multi-Fidelity Learning (IMFL) framework this week at NeurIPS 2023 for achieving cost-effective development of domain-specific LMs. For cost-effective development of small domain-specific LLMs, we can alleviate manual efforts when fine-tuning through the use of LLMs as knowledge bases for automatically annotating new data. Our approach capitalizes on the insight that different data samples inherently exhibit for different levels of learning difficulty, meaning it is not completely necessary to leverage human annotation for every sample. By discerning each sample’s difficulty level, we found that most annotation tasks can be delegated to automatic annotation tools such as LLMs while assigning a limited number of highly uncertain samples to human annotators. This can reduce human labor significantly while still maintaining high annotation quality.

To test the framework, we compared IMFL to human annotations or all LM annotations to evaluate the effectiveness of our proposed multi-fidelity paradigm. The experimental analysis revealed several notable findings:

Employing IMFL can significantly reduce the high cost of human annotation in domain-specific tasks. In fact, LMs tuned through the proposed IMFL framework with GPT-3.5 as an auto-annotator significantly outperform LMs tuned with 3× human annotations, and are even on par with LMs tuned with 5× human annotations in some cases.
IMFL efficiently uses sparse human supervision to improve GPT-3.5 annotations through prompt retrieval and in-context learning, ultimately leading to enhanced performance. IMFL surpasses the performance of LLM annotators and achieves highly competitive performance in comparison to human annotators — albeit at a substantially lower cost and effort.

Despite the promising performance, there are certain limitations to our approach that should be recognized:

The current IMFL framework assumes that the annotation budget is simply defined by the number of annotations, rather than reflecting the true cost which typically involves multiple complex factors in real-world scenarios.
IMFL’s performance is limited by the size of the unannotated dataset and the diversity of examples presented in the dataset as IMFL only seeks to improve performance through annotating existing samples.
Limited by budgets and the capacity of the LM to be fine-tuned, IMFL does not achieve state-of-the-art performance in some general natural language processing tasks, where directly adopting the latest LLMs remains a better choice.

In contrast to single-fidelity annotations, such as only human or only LLMs, our results indicate that IMFL effectively addresses limitations related to cost saving, annotation quality, and efficiency. Looking ahead, we anticipate the performance of IMFL to continue to grow by incorporating stronger LLM annotators, such as GPT-4. For a deep dive into the research, take a look at our NeurIPS 2023 paper and recording.

_________________________________________________________________

Intuit’s AI Research Program is an intrapreneurial function within the company that pushes the boundaries of AI. We develop and incubate AI-driven technology breakthroughs to solve our customers’ most important financial problems.

We’re a diverse team of research scientists, data scientists, and engineers with extensive expertise in AI, including natural language processing, generative AI, robust and explainable AI, symbolic AI, machine learning, and optimization.

To connect with us about open roles, partnerships or collaborations, contact ai-research@intuit.com

NeurIPS 2023: Intuit AI Research Presents Interactive Framework for Cost-Effective Fine-Tuning of Domain Specific Language Models

Written by Jiaxin Zhang