DSPy + Snowflake:

Towards Secure, Future-Proof, and Cost Efficient LLM Pipelines

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

16 min readJun 13, 2024

Building applications with large language models (LLMs) requires overcoming several key challenges — securing proprietary data, engineering LLM-specific prompts, and maintaining robust and reliable systems. The combination of Snowflake Cortex with DSPy provides a comprehensive solution to address these obstacles.

In this post, we’ll walk through how to leverage DSPy’ new Snowflake integration to setup, build, evaluate, and optimize an example Retrieval-Augmented Generation (RAG) program with DSPy and Snowflake.

We demonstrate that, by algorithmically tuning a Mixtral 8x7B pipeline with DSPy and Snowflake Cortex, we can outperform a larger Llama3–70B pipeline, while delivering nearly 4X cost savings.

At the time of writing, Mixtral 8x7B costs .22 credits / 1M tokens on Snowflake Cortex versus 1.21 credits / 1M tokens for Llama3–70B.

Why Snowflake Cortex?

To address the security and infrastructure management challenges of building LLM systems, Snowflake Cortex provides users with instant access to industry-leading LLMs. Through this fully managed, secure, and scalable service, proprietary data never leaves the governed Snowflake ecosystem.

Why DSPy?

The Stanford NLP Group developed DSPy to eliminate the need for manual prompt engineering and to empower users to write maintainable LLM pipelines. With this open source solution, rather than manually iterating prompts until each pipeline step works in isolation, DSPy algorithmically optimizes the LLM prompts and weights. Any change to the pipeline, model, or data no longer requires rebuilding components from scratch.

DSPy + Snowflake Setup

The DSPy Snowflake integration is available starting with DSPy version 2.4.10. You can use pip to install it along with the Snowflake requirements.

pip install "dspy-ai[snowflake]"

For this example, we’ve pre-loaded embeddings from 5 years of Snowflake’s annual reports into a Snowflake table called SEC_EMBEDDINGS. We’ve use the SnowVecDB utility to automatically load the annual reports from a local directory, generate the embeddings, and insert them into a Snowflake table. We can do this as follows:

from snowvecdb import SnowVectorDB

snowpark =Session.builder.configs(connection_parameters).create()
SVDB = SnowVectorDB(snowflake_session=snowpark,chunk_size=500,chunk_overlap=75)
SVDB(
  vector_table_name="SEC_EMBEDDINGS",
  data_source_directory="your_local_path_to_annualreports"
)

Once our embeddings are in Snowflake, we can ensure that no data leaves the Snowflake governed ecosystem by configuring DSPy to use a Snowflake Cortex language model (LM) with a Snowflake retriever model (RM). We use dspy.Snowflake and SnowflakeRM below in order to do this.

import dspy
from dspy.retrieve.snowflake_rm import SnowflakeRM

# Snowflake Cortex Language Model Definition
turbo = dspy.Snowflake(model="mixtral-8x7b",credentials=connection_parameters)

# Snowflake Retriever Model Definition
snowflake_retriever = SnowflakeRM(
  snowflake_table_name="SEC_EMBEDDINGS",
  snowflake_credentials=connection_parameters
)

# Configure which LM and RM to use in DSPy
dspy.settings.configure(lm=turbo,rm=snowflake_retriever)

By configuring the LM and RM settings above, we ensure that future pipeline runs will:

retrieve relevant context from the SEC_EMBEDDINGS Snowflake table
generate responses to user queries using a Mixtral 8x7B model hosted by Snowflake Cortex

Modular LLM Pipelines with DSPy + Snowflake

Once our initial setup and configuration is complete, we can use DSPy to implement a simple RAG application that is decoupled from any prompt, language model, or data source.

Two building blocks of DSPy programs are Signatures and Modules. First, we use a dspy.Signature to specify our desired inputs and outputs. And with a dspy.Module we define the control flow of our program.

class GenerateAnswer(dspy.Signature):
   """Answer questions with short factoid answers."""
   context = dspy.InputField(desc="may contain relevant facts")
   question = dspy.InputField()
   answer = dspy.OutputField(desc="often between 1 and 10 words")

class RAG(dspy.Module):
   def __init__(self, num_passages=5):
       super().__init__()

       self.retrieve = dspy.Retrieve(k=num_passages)
       self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

   def forward(self, question):
       context = self.retrieve(question).passages
       prediction = self.generate_answer(context=context, question=question)
       return dspy.Prediction(context=context, answer=prediction.answer)

DSPy’s abstractions allow us to decouple our pipeline implementation from the underlying data, prompt, and language model.

When we call this RAG pipeline:

dspy.Retrieve will leverage the user-configured retriever model (snowflake_retriever) to get the relevant context from our SEC_EMBEDDINGS table
dspy.ChainOfThought will submit a chain of thought prompt to our user configured language model (turbo, a Snowflake managed Mixtral 8x7B model) to generate a response

If we want to test different models or use a different knowledge base, all we have to do is update the configuration settings.

Basic Pipeline Usage

After configuring DSPy and defining the pipeline, we can initialize and start using the program.

rag = RAG()
rag("In what fiscal year did Snowflake IPO?")

Under the hood, our RAG program retrieves the relevant passages from our Snowflake embeddings table, injects that context into a chain of thought prompt, and sends it to the Snowflake Cortex model that we’ve configured. Below is the DSPy-generated prompt:

Answer questions with short factoid answers.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 10 words

---

Context:
[1] «fiscal year ended January 31, 2021 filed with the SEC on March 31, 2021.
Overview
We believe in a data connected world where organizations have seamless access to explore, share, and unlock the value of data. To realize this vision, we deliver the Data
Cloud, a network where Snowflake customers, partners, data providers, and data consumers can break down data silos and derive value from rapidly growing data sets in secure,
governed, and compliant ways.
Our platform is the innovative technology that powers the Data Cloud, enabling customers to consolidate data into a single source of truth to drive meaningful business insights,
build data-driven applications, and share data. We provide our platform through a customer-centric, consumption-based business model, only charging customers for the resources
they use.
.......
[4] «Our platform is the innovative technology that powers the Data Cloud, enabling customers to consolidate data into a single source of truth to drive meaningful insights, apply
AI to solve business problems, build data applications, and share data and data products. We provide our platform through a customer-centric, consumption-based business model,
only charging customers for the resources they use.
...
Our cloud-native architecture consists of three independently scalable but logically integrated layers across compute, storage, and cloud services. The compute layer provides
dedicated resources to enable users to simultaneously access common data sets for many use cases with minimal latency. The storage layer ingests massive amounts and varieties of
structured, semi-structured, and unstructured data to create a unified data record. The cloud services layer intelligently optimizes each use case’s performance requirements with no
administration. This architecture is built on three major public clouds across 40 regional deployments around the world. These deployments are generally interconnected to deliver
the Data Cloud, enabling a consistent, global user experience.»

Question: In what fiscal year did snowflake IPO?

Reasoning: Let's think step by step in order to The fiscal year that Snowflake went public (IPO) was 2021. Context: [1] «fiscal year ended January 31, 2021 filed with the SEC on March 31, 2021.»

Answer:{'messages': ' Fiscal year 2021.'}

So far, we’ve demonstrated how an LLM pipeline can be declaratively written using DSPy and Snowflake Cortex.

Flexible LLM Performance Evaluation

Once we’ve defined the program, we need a way to evaluate performance. There are a variety of common metrics used in industry for benchmarking LLM system performance. There have also been recent advancements in using third party agents to judge the performance of LLMs (for examples, see LLM-as-a-Judge by LMSYS or prior work on LM Cross Examination by Deepmind).

Any of the above approaches can be implemented with DSPy, because it allows us to use arbitrary python functions as well as other DSPy programs to define, measure, and optimize evaluation metrics.

In the example below, we use an LLM-as-a-Judge approach to compare the performance of our Mixtral 8x7B pipeline with a larger Llama3–70B pipeline. To do this, we define a semantic similarity metric that uses an LLM to determine whether the predicted response is semantically correct based on the ground truth answer. We want an independent judge to evaluate performance, so we’ll use the DSPy context manager to configure a Reka Flash model for the assessment.

class Judge(dspy.Signature):
    """Judge if the predicted answer contains the ground truth answer."""

    ground_truth = dspy.InputField(desc="ground truth")
    prediction = dspy.InputField(desc="predicted answer")
    assessment_answer: bool = dspy.OutputField(desc="only True or False without any rationale")

reka = dspy.Snowflake(model="reka-flash",credentials=connection_parameters)
judge = dspy.ChainOfThought(Judge)

def semantic_similarity(example, pred, trace=None):

    with dspy.settings.context(lm=reka):    
        
        equivalent = judge(ground_truth=example.answer, prediction=pred.answer)
            
    return True if "true" in equivalent.assessment_answer.lower() else False

For this evaluation exercise, we’ll use the industry standard HotPotQA data set for testing pipeline performance. We will need an appropriate knowledge base + retriever model for the use case, so we swap our configuration settings to use a publicly hosted ColbertV2 retriever with Wikipedia abstracts:

colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(rm=colbertv2_wiki17_abstracts)

To evaluate our baseline Mixtral 8x7B pipeline performance using the semantic similarity metric, we use DSPy’s built-in evaluation utility:

from dspy.evaluate.evaluate import Evaluate

evaluate = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=0)
evaluate(RAG(),semantic_similarity)

To account for the non-deterministic nature of LLMs, we execute 5 evaluation runs with our Reka Flash Judge to get an average baseline accuracy score of 75% for our Mixtral 8x7B pipeline.

For comparison, we also test the performance using Llama3–70B in our pipeline.

llama_turbo = dspy.Snowflake(model="llama3-70b",credentials=connection_parameters)
with dspy.context(lm=llama_turbo):

    print(evaluate(RAG(),semantic_similarity))

Executing 5 evaluation runs with our Reka Flash Judge returns an average accuracy score of 83% for our Llama3–70B pipeline. As expected, we find that the larger Llama3–70B pipeline outperforms Mixtral 8x7B (by about 10% on this use case).

Pipeline Optimization enables 4X cost savings

One of the key benefits of using DSPy, beyond being able to write modular and declarative LLM pipelines, is the ability to algorithmically tune prompts. DSPy has several built-in optimizers that employ different strategies for tuning the parameters of your DSPy program. The optimizer will maximize a given metric - in our case, semantic similarity.

Below we use one of DSPy’s Few Shot Learning optimizers. This method randomly samples and injects question and answer pairs from the training data into our prompt and it generates additional training examples using the LM.

The optimizer also allows us to use a larger teacher model like Mistral Large to help us train the Mixtral 8x7B pipeline. By using a larger model during training and a smaller model in our deployed pipeline, we take advantage of the larger model’s performance with the cost profile of the smaller model. To accomplish this, we configure the teacher_settings argument when we compile the pipeline.

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

mistral = dspy.Snowflake(model="mistral-large",credentials=connection_parameters)
optimizer = BootstrapFewShotWithRandomSearch(metric=semantic_similarity)

optimized_pipeline = optimizer.compile(
  RAG(),
  trainset=trainset,
  teacher_settings=dict(lm=mistral)
  )

evaluate(optimized_pipeline,semantic_similarity)

After executing 5 evaluation runs of the optimized pipeline, we find that using the optimizer improves the Mixtral 8x7B pipeline performance by almost 20%, to an average of 88% accuracy.

By optimizing our Mixtral 8x7B pipeline with DSPy + Snowflake, we outperform our Llama3–70B baseline performance while using a 5X cheaper model. Note: optimized prompt length is longer than unoptimized prompt, so cost savings are nearly 4X.

Under the hood, the DSPy optimizer has tuned our prompt as follows:



Answer questions with short factoid answers.

---

Question: Who composed "Sunflower Slow Drag" with the King of Ragtime?
Answer: Scott Hayden

Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?
Answer: 2010

Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Answer: 1950

........

Question: Who was coach of the No. 9-ranked team that was upset in the NCAA Tournament by the 2014-15 UAB Blazers men's basketball team?
Answer: Fred Hoiberg

Question: Do Stu Block and Johnny Bonnel's bands play the same type of music?
Answer: no

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 10 words

---

Context:
[1] «The Sword in the Stone (film) | The Sword in the Stone is a 1963 American animated musical fantasy comedy film produced by Walt Disney and released by Buena Vista Distribution. The 18th Disney animated feature film, it was the final Disney animated film to be released before Walt Disney's death. The songs in the film were written and composed by the Sherman Brothers, who later wrote music for other Disney films like "Mary Poppins" (1964), "The Jungle Book" (1967), "The Aristocats" (1970), and "Bedknobs and Broomsticks" (1971).»
[2] «A Symposium on Popular Songs | A Symposium on Popular Songs is a special cartoon featurette made by Walt Disney Productions in 1962. It features songs written by the Sherman Brothers with music arrangements by Tutti Camarata. The Shermans also co-wrote the screenplay but are not credited for this. Host Ludwig Von Drake invites his audience into his mansion where he tells all about popular music through the years, introducing several songs illustrated with stop-motion photography. The film was nominated for an Academy Award for Best Animated Short Film. It was released on DVD in 2005 as part of the Walt Disney Treasures set "Disney Rarities".»
[3] «Winnie the Pooh and the Blustery Day | Winnie the Pooh and the Blustery Day is a 1968 animated featurette based on the third, fifth, ninth, and tenth chapters from "Winnie-the-Pooh" and the second, eighth, and ninth chapters from "The House at Pooh Corner" by A. A. Milne. The featurette was produced by Walt Disney Productions and released by Buena Vista Distribution Company on December 20, 1968 as a double feature with "The Horse in the Gray Flannel Suit". This was the second of the studio's Winnie the Pooh shorts. It was later added as a segment to the 1977 film "The Many Adventures of Winnie the Pooh". The music was written by Richard M. Sherman and Robert B. Sherman. It was notable for being the last animated short produced by Walt Disney, who died during its production.»
[4] «Winnie the Pooh and the Honey Tree | Winnie the Pooh and the Honey Tree is a 1966 animated featurette based on the first two chapters of the book "Winnie-the-Pooh" by A. A. Milne. The film combines live-action and animation, and was produced by Walt Disney Productions. Its songs were written by the Sherman Brothers (Richard M. Sherman and Robert B. Sherman) and the score was composed and conducted by Buddy Baker.»
[5] «Robert B. Sherman | Robert Bernard Sherman (December 19, 1925 – March 6, 2012) was an American songwriter who specialized in musical films with his brother Richard Morton Sherman. According to the official Walt Disney Company website and independent fact checkers, "the Sherman Brothers were responsible for more motion picture musical song scores than any other songwriting team in film history." Some of the Sherman Brothers' best known songs were incorporated into live action and animation musical films including: "Mary Poppins", "The Jungle Book", "The Many Adventures of Winnie the Pooh", "Chitty Chitty Bang Bang", "The Slipper and the Rose", and "Charlotte's Web". Their most well known work, however, remains the theme park song "It's a Small World (After All)". According to Time.com, this song is the most performed song of all time.»

Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?

Reasoning: Let's think step by step in order to identify the distributor of the 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs. We know from context [3] that "Winnie the Pooh and the Blustery Day" is a 1968 animated featurette for which the music was written by Richard M. Sherman and Robert B. Sherman. It was released by Buena Vista Distribution Company. However, the question refers to a 1977 film. According to context [5], the Sherman Brothers wrote songs for "The Many Adventures of Winnie the Pooh". This film was released in 1977 and was produced by Walt Disney Productions.

Answer: Buena Vista Distribution

.......

Context:
[1] «Battle of Kursk | The Battle of Kursk was a Second World War engagement between German and Soviet forces on the Eastern Front near Kursk (450 km south-west of Moscow) in the Soviet Union during July and August 1943. The battle began with the launch of the German offensive, Operation Citadel (German: "Unternehmen Zitadelle" ), on 5 July, which had the objective of pinching off the Kursk salient with attacks on the base of the salient from north and south simultaneously. After the German offensive stalled on the northern side of the salient, on 12 July the Soviets commenced their Kursk Strategic Offensive Operation with the launch of Operation Kutuzov (Russian: Кутузов ) against the rear of the German forces in the northern side. On the southern side, the Soviets also launched powerful counterattacks the same day, one of which led to a large armoured clash, the Battle of Prokhorovka. On 3 August, the Soviets began the second phase of the Kursk Strategic Offensive Operation with the launch of Operation Polkovodets Rumyantsev (Russian: Полководец Румянцев ) against the German forces in the southern side of the Kursk salient.»
[2] «Operation Mars | Operation Mars, also known as the Second Rzhev-Sychevka Offensive Operation (Russian: Вторая Ржевско-Сычёвская наступательная операция), was the codename for an offensive launched by Soviet forces against German forces during World War II. It took place between 25 November and 20 December 1942 around the Rzhev salient in the vicinity of Moscow.»
[3] «Kholm Pocket | The Kholm Pocket (German: "Kessel von Cholm" ; Russian: Холмский котёл ) was the name given for the encirclement of German troops by the Red Army around Kholm south of Leningrad, during World War II on the Eastern Front, from 23 January 1942 until 5 May 1942. A much larger pocket was simultaneously surrounded in Demyansk, about 100 km to the northeast. These were the results of German retreat following their defeat during the Battle of Moscow.»
[4] «Operation Bagration | Operation "Bagration" ( ; Russian: Oперация Багратио́н , Operatsiya "Bagration") was the codename for the Soviet 1944 Belorussian Strategic Offensive Operation, (Russian: Белорусская наступательная операция «Багратион» , Belorusskaya nastupatelnaya Operatsiya "Bagration") a military campaign fought between 22 June and 19 August 1944 in Soviet Byelorussia in the Eastern Front of World War II. The Soviet Union achieved a major victory by destroying the German Army Group Centre and completely rupturing the German front line.»
[5] «Operation Uranus | Operation "Uranus" (Russian: Опера́ция «Ура́н», romanised: "Operatsiya "Uran"" ) was the codename of the Soviet 19–23 November 1942 strategic operation in World War II which led to the encirclement of the German Sixth Army, the Third and Fourth Romanian armies, and portions of the German Fourth Panzer Army. The operation formed part of the ongoing Battle of Stalingrad, and was aimed at destroying German forces in and around Stalingrad. Planning for Operation "Uranus" had commenced in September 1942, and was developed simultaneously with plans to envelop and destroy German Army Group Center and German forces in the Caucasus. The Red Army took advantage of the German army's poor preparation for winter, and the fact that its forces in the southern Soviet Union were overstretched near Stalingrad, using weaker Romanian troops to guard their flanks; the offensives' starting points were established along the section of the front directly opposite Romanian forces. These Axis armies lacked heavy equipment to deal with Soviet armor.»

Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?

Reasoning: Let's think step by step in order to produce the answer. We need to identify the specific battle being referred to, and then find the code name for the German offensive that initiated it. The context provides information about several battles, but only one fits the criteria of being a few hundred kilometers from Moscow and involving both Soviet and German forces. That would be the Battle of Kursk. The code name for the German offensive that started this battle was Operation Citadel.

Answer: Operation Citadel

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University of Edinburgh, Savilian Professor of Astronomy at the University of Oxford, and a commentator on Isaac Newton's "Principia".»
[4] «David Gregory (historian) | David Gregory (1696–1767) was an English churchman and academic, Dean of Christ Church, Oxford and the first Regius Professor of Modern History at Oxford.»
[5] «Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daughters. Gregory rapidly depleted the "publicum" (public land) of the Duchy of Gaeta by doling it out to family members as grants. Gregory disappears from the records in 964 and was succeeded by his younger brother Marinus of Fondi over the heads of his three sons. It is possible that there was an internal power struggle between factions of the Docibilan family and that Gregory was forced out. On the other hand, perhaps he died and his sons fought a losing battle for their inheritance to Gaeta.»

Question: What castle did David Gregory inherit?

Reasoning: Let's think step by step in order to Kinnairdy Castle in 1664. Reference(s): [1] Context: "David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors."

Answer:{'messages': ' Kinnairdy Castle in 1664.\n\nReference(s):\n[1] Context: "David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors."'}

Above we see the combination of question & answer pairs and generated examples that our optimizer has selected to maximize our semantic similarity score.

Conclusion

DSPy is revolutionizing how LLM pipelines are built. It allows developers to decouple their LLM pipelines from the underlying language models and prompts. Additionally, it eliminates the need for laborious prompt engineering by automating the process away.

Snowflake Cortex empowers practitioners to leverage best-in-class LLM’s without having to move their data out of the platform in which it’s stored.

The combination of DSPy with Snowflake enables practitioners to build secure, performant, and easy to maintain LLM systems that are cost efficient.

To learn more about DSPy, we encourage you to visit the documentation. For the complete DSPy + Snowflake demo notebook, you can visit the demo repo here.