How LLMs Transformed Our Customer Service with Efficient Query Handling and Cost Reduction

Dr. Manoj Kumar Yadav
redbus India Blog

--

In today’s fast-paced, digital-first world, customer service is no longer about simply answering questions; it’s about solving problems efficiently and accurately. With the introduction of Large Language Models (LLMs) like GPT, we’ve experienced a seismic shift in how customer interactions are handled. However, one of the biggest challenges we faced was finding a way to optimize the model’s performance without spiraling costs.

Here’s how we tackled this issue, reduced costs tenfold, and improved customer experience by moving away from traditional decision trees.

The Old Way: Selection Trees and Decision Overload
Historically, customer service systems relied heavily on decision trees — automated flows designed to lead users through a series of predefined questions and options. While this was an effective way to solve problems, it often led to decision fatigue, as customers were forced to navigate complex trees before arriving at their solutions.

This approach was functional but inefficient. It required a lot of manual programming to set up, and the user experience was far from seamless. Enter LLMs and the ability to interpret queries in natural language, bypassing those tedious flows. However, there was one catch: high token counts and model usage drove costs skyward.

The Game-Changer: Using a Vector DB for Function Definitions
We knew we had to innovate to stay competitive, so we built a custom Vector Database (DB) on MapDB — a lightweight, high-performance storage engine. This wasn’t your typical Vector DB, though. Instead of using it to store customer data or information, we decided to keep all function definitions inside the database.

Here’s how it worked:
- Storing Function Definitions: Instead of storing the code itself in the Vector DB, we only stored function definitions. This allowed us to quickly retrieve the relevant functions based on a user’s query without involving the actual function code at this stage.
- Code Stays with the Language: The actual code for these functions stayed embedded with the programming language. This meant that when a relevant function was identified, it could be executed directly without bloating the input token count.

Smart Query Handling: Reducing Token Count with Relevance-Based Retrieval
One of the significant challenges of using LLMs in a customer-facing environment is managing token counts. Every token matters because it directly impacts costs. In our earlier versions, we included more information than necessary in each query, leading to inflated token counts and, consequently, higher costs.

With the Vector DB, we were able to pick only the relevant set of functions based on the user’s query. This meant that instead of sending a massive chunk of input to the model, we sent only what was needed. The result? A substantial drop in input token counts and a more streamlined response generation process.

Cost Reduction: Switching to GPT-4o & Function Embedding
While optimizing token usage was a big win, it wasn’t enough. We also needed to look at the underlying model itself. Initially, we were using GPT-4, which, while powerful, came at a high price. Switching to GPT-4o allowed us to maintain the same level of accuracy and capability but at a significantly reduced cost.

By combining:
- Our efficient Vector DB approach on MapDB,
- Function-based retrieval of only relevant information, and
- Switching to a lower-cost, high-performance model (GPT-4o),

We were able to reduce our costs by 10 times compared to our first implementation. This transformation was not just about cost-saving but also about scalability. We could now serve more customers faster, more efficiently, and without breaking the bank.

Performance Improvements: Happier Customers, Smoother Service

In addition to the cost savings, this approach significantly boosted key performance metrics. By switching from decision trees to an intelligent query handling system powered by LLMs, we achieved remarkable results:

  • 20% Improvement in CSAT: Our Customer Satisfaction (CSAT) scores improved by 20%, reflecting happier customers who were receiving faster and more accurate answers.
  • 20% Reduction in Agent Connect: Because the LLMs could handle more queries independently, we saw a 20% drop in the need for customers to be escalated to human agents.
  • 50% Reduction in Chat Drop Rate: One of the most striking changes was a 50% reduction in chat drops, where users leave the conversation prematurely. With more precise and relevant responses, users stayed engaged and received the help they needed without frustration.

Final Thoughts
Large Language Models have indeed transformed the way we serve our customers, but the real magic lies in how you use them. By strategically building our own Vector DB on MapDB, optimizing token usage, and choosing the right model, we have revolutionized our customer service experience. The days of cumbersome decision trees are gone, and with this innovative approach, we’re providing customers with the answers they need — faster, cheaper, and smarter.

If you’re looking to implement LLMs for your business, remember: it’s not just about adopting the latest technology; it’s about using that technology efficiently. The right optimizations can drastically reduce costs while enhancing customer satisfaction — a win-win for everyone involved.

Reference:

https://azure.microsoft.com/en-in/products/ai-services/openai-service

--

--

Dr. Manoj Kumar Yadav
redbus India Blog

Doctor of Business Administration | VP - Engineering at redBus | Data Engineering | ML | Servers | Serverless | Java | Python | Dart | 3D/2D