Transforming +40min call transcripts into structured JSON summaries using mistral-large in Snowflake Cortex AI

Publication written in partnership with ryan n from Compare Club.

Compare Club helps millions of Australian consumers make informed purchasing decisions on products and services such as health and life insurance, energy, home loans, and more. By treating data as a valuable tool, our experts make it easier and faster for customers to sign up for the most appropriate product and get more value from their budgets.

One area where we were not using data to its full potential was in our member call interactions. Cohort analysis revealed that returning members were more likely to make a purchase compared to new members. However, complex details from sales conversations were not always recorded in our CRM, making it difficult to utilise this information in future calls. This oversight caused frustration and prevented our experts from accessing the complete history needed to provide tailored recommendations effectively.

Improving the customer experience with Snowflake Cortex AI

Our team had thousands of +40-minute call transcripts that were not being analyzed. With the rise of Large Language Models (LLMs) that can quickly summarize and run other NLP tasks we embarked on the journey of finding a way to deliver better customer experiences for our return customers.

As we started, here are few of the challenges that surfaced fairly quickly:

Context window limits. Each call transcript had the length of a +100 page document. To do things like a summary of the call, it wasn’t feasible to pass a single document at once because it would quickly hit the context window limits of every LLM.

Security and efficiency. To not have to manage the LLM inference infrastructure on our own, running a summarization task for every call would require moving large volumes of sensitive call recordings into a third-party service that would require validation and security approvals.

Using Snowflake, we were able to tackle both of these fairly quickly as the LLMs in Cortex AI run inside Snowflake so no data movement was required and running both pre-processing tasks (e.g. chunking) and LLM tasks was all easily done with a little bit of SQL and Python.

At the time of evaluation, we tried a few of the LLMs and decided to use mistral-large because of its ability to handle larger context windows (tokens), its performance in generating structured summaries in JSON format and because of its open source license that helped us ensure we could always run this same task in other platforms.

Running NLP tasks it’s really easy with Cortex AI– all you have to do is use the COMPLETE function to choose a model, add a prompt and then point to the data — in this case being stored in the CALL_TRANSCRIPTIONS table.

High level summary of data pipeline and prompt preparation

  1. We get the raw transcript json in format, split out by word with offset timestamp

2. We then extract out the spoken words and roll it up at a sentence level (using lateral flatten) (this gives alot more flexibility when it comes to different types of analysis)

processed_words= LATERAL FLATTEN(input => rawdata.raw_words) raw_words
listagg(processed_words,’ ‘) as sentence

3. We then enrich the sentence level transcript data with things like customer segment models, sales team details, stage of call flags, to allow for customised prompt filter querying and diagnostics.This sentence level model is what then feeds downstream use cases such as streamlit transcript interrogation apps, batch call summary jobs, as well adhoc data exploration

Here’s a preview of what our summarization prompt looked like which seeked to turn +100 pages into a structured JSON output that extracted the following fields: Initial goal, primary needs, loyalty & history, current issues, enthusiasm, pre-hold attitude, interest in other products, reaction to presentation, end of call objection, objection handle attempt, objection handle response.

SELECT SNOWFLAKE.CORTEX.COMPLETE( 
'mistral-large', CONCAT( 'Your task is to summarise and make notes, where
applicable, about some key points of a phone conversation.
You will be provided with a conversation, a set of instructions and
a desired output format. In the conversation, A denotes the Agent and
C denotes the Customer.',
'<conversation>',CONVERSATION_TRUNCATED,'</conversation>',
'<instructions>: Summarise the customer\’s initial goal when reviewing
health insurance options. Note the customer\’s primary needs when
the consultant assesses which product to select. Note whether the
customer brings up loyalty or any history of switching insurance.
Note the customer\’s concerns or issues with the current provider.
Determine whether the customer is enthusiastic in the consultant\’s approach.
Be concise and use dot point format in your response. </instructions>',
'<format>: Return the output in JSON format for each instruction.
Use these headings only in your JSON formatted output: initial goal,
primary needs, loyalty & history, current issues, enthusiasm, </format>')
) AS SUMMARY from CALL_TRANSCRIPTIONS );

Activation: Surfacing summaries to the agents

Our agents, like almost every support representative or customer success manager, don’t spend their day to day inside the data platform. To ensure the summaries were used to drive more engaging conversations with our repeat members, we built an automated pipeline that takes our call summaries into Salesforce. Because this automated pipeline is a flow Snowflake can efficiently deliver, it was easy to turn our project from a proof of concept (POC) to a production application.

What’s next:

Lots of use cases are coming up now that we were able to deliver a valuable solution to the business using generative AI. Using Cortex AI which takes care of the heavy lifting in infrastructure and security, our data and analytics team is highly efficient in developing apps that continue to make us a data-driven organization. We look forward to see what additional functionality is developed in Cortex AI as well as trying out new models as they are seamlessly integrated such as the new Llama 3 405B model which we plan to test for more advanced use cases as well as its ability to handle even larger context windows to process the lengthiest of our sales calls

--

--