Personalize LLMs for Your Business Needs with Cortex Fine-Tuning in Snowflake

Co Author : Arne Mauser

Photo Credit : UnSplash

Large Language Models (LLMs) have become game-changers in Natural Language Processing (NLP), offering remarkable capabilities in areas like text generation, language translation and sentiment analysis. The rise of transformer-based pretrained language models (PLMs), especially large language models (LLMs) with billions of parameters, has revolutionized many natural language processing (NLP) tasks. However, their large size and computational needs can be challenging, particularly when resources are limited and the accuracy after evaluation is low. This set of requirements especially for domain specific tasks lead to the need of customizing large language models.

A solution to customizing the Large Language model is called Fine Tuning which is essentially taking a pre-trained model and further training it on a specific dataset. This approach takes advantage of pre-trained models, boosting their performance on specific tasks. The increasing size of Pretrained Language Models (PLMs) significantly raises computational resource requirements, making task-specific fine-tuning, complex and resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) addresses these demands by using deep learning techniques to reduce trainable parameters while maintaining performance. PEFT updates only a small subset of parameters, preserving the PLM’s knowledge, adapting it to the target task, and minimizing catastrophic forgetting [1]. This technique is called Parameter Efficient Fine Tuning (PEFT)

Today, we find a multitude of large language models, which leads customers with uncertainty about which foundational model would best suit specific needs. With so many number of options, some of the common concerns shared by customers include:

  • Increase performance with limited time and resources.
  • Strategies to maximize accuracy without the necessity of pre-training a foundational model from scratch
  • Evaluation method and verification of the results
  • Reduce computational resource demands and memory usage during fine tuning
  • Tailor AI to their unique business requirements

While prompt engineering techniques applied to LLM models are generally effective, they can sometimes underperform on task-specific problems. Fine-tuning can help overcome these issues, offering benefits like reducing computational costs and utilizing leading-edge models without pre-training from scratch.

Snowflake provides a managed service for Fine Tuning, enabling organizations to refine popular LLMs using their own data, all within Snowflake’s platform. With Cortex Fine-Tuning, users can employ parameter-efficient fine-tuning (PEFT) to leverage custom adaptors for pre-trained models, focusing on more specialized tasks. Let’s explore how this feature works and how a hypothetical company, Tasty Bytes, improved their Customer Experience by achieving customer satisfaction quickly and efficiently

Figure 1: Snowflake — An Unified platform for end-to-end Gen AI and ML

Business UseCase

Tasty Bytes is a fictional global food truck enterprise that has established its presence in 30 cities spanning across 15 countries, boasting a network of 450 trucks offering 15 diverse menu types under various brands. The mission at Tasty Bytes is committed to improve the Customer Experiences by leveraging the power of AI with Snowflake Cortex. We will walk through how Tasty Bytes developed an automated support agent using low-code, customized LLM models. This agent delivers empathetic and engaging conversations, provides 24/7 support, and ensures quick issue resolution, all achieved with minimal coding effort.

Missing information in emails sent to customer support is a common issue observed by the Tasty Bytes team. In this case , the support team needs at a minimum 2 fields which are the City and the truck where the item was purchased. Since the customer support cannot proceed without the required information the customer support agent now has to send a follow-up email asking a clarification question to obtain more information on the missing data.

Figure 2 : An excerpt from the Customer Support Email with partial information

Adapting LLMs to business use cases

Tasty Bytes wanted to leverage LLMs for their business needs but faced challenges in selecting the right model, determining the most effective technique, and achieving the highest accuracy.

To decide which technique would best suit the needs and help achieve highest accuracy, let’s explore and compare the accuracy numbers in each step.

1. Prompt Engineering, a naive approach for domain specific tasks

Instruction tuning, a type of Prompt engineering, calls for a detailed grasp of the model’s processing abilities and takes advantage of the inherent knowledge in the model. As it solely relies on strategic input formulation to yield results, it doesn’t need labeled dataset nor computational resources, except for the consumption of the number of tokens that attributes to the total cost. It is a very easy technique with not much sophistication.

The drawbacks of this approach are:

  • Requires hand-crafting a prompt.
  • More challenging to utilize labeled data.
  • Restricted by the context window.
  • Less reliable output steering.
Figure 3 : Prompt Engineering Template

2. Fine Tuning

Let’s explore the next option available to us. Fine-tuning allows users to adapt pre-trained LLMs to more specialized tasks. By fine-tuning a model on a small dataset of task-specific data, you can improve its performance on that task while preserving its general language knowledge. With Fine Tuning, a small number of labeled dataset with prompt and completion pairs can be used to train a few layers of the foundation model. The input data must be in JSONL format.

The accuracy multiplies manifold and the cost drops deep quickly with a few hundred examples.If there is a compelling need to carry full custom fine tuning, switch to Snowpark Container Services where you can bring your own model to carry customized fine tuning with different sets of parameters and strategies.

The Benchmarking Process

The following charts show the benchmarking done against a fine tuned LLM from Cortex AI and other foundation models. It is seen that the Fine Tuned Mistral 7B excelled and exceeded in all categories. The following sections covers the comparison with various models in terms of:

  • Accuracy
  • Cost
  • Efficiency (Context window and others)
  1. Increased accuracy compared to other Foundation Models
Fig 4. Prompting vs Fine Tuning Accuracy Comparison

2. Same accuracy but at a lower cost compared to other Foundation Models

Fig 5. Prompting vs Fine Tuning Cost Comparison

3. Increased Efficiency due to smaller context window and hence lower cost

Fig 5. Prompting vs Fine Tuning Efficiency Comparison

For highly accurate results and production-ready solutions for your GenAI use cases at a low cost, bring your GenAI use cases to Snowflake Cortex and realize the transformative value immediately. Follow the step-by-step instructions on this Snowflake QuickStart page to get your business up and running with customized LLMs using Cortex Fine Tuning. If you are interested in exploring other practical applications and benefits of Snowflake specific to the Tasty Bytes series, visit the Snowflake Developers exclusive page for Tasty Bytes here.

Conclusion:

AI is the largest opportunity to transform your business and customizing AI your business way in a reliable and efficient manner is the need of the day. The Cortex AI suite is suitable for most of the AI based and fine tuning use cases. Overall we saw how Cortex Fine-Tuning can help you achieve serverless customization and management of LLMs inside Snowflake. It is simply simple and cost efficient, you just need the data in the Snowflake Data Cloud and off you go!

References

[1] PEFT https://arxiv.org/pdf/2312.12148

[2] *Cortex Fine Tuning is in Public Preview at the time of this writing

--

--