A live chatbot with Databricks in 3 days

5 min readApr 23, 2024

How we have been able to build a basic product recommendation chatbot live on an e-commerce web site using RAG and vector databases in a few days.

The use case

Kiliba, the company I’m working with has a mission to empower enterprises worldwide to meaningfully engage with their customers. Our historical product is to send personalised customer marketing emails on autopilot.
I always had this idea to go beyond emails and provide a chatbot directly on the website of the online shops. So when Databricks proposed us to participate to their genAI Hackaton, I jumped in to develop this idea.
We recognized an opportunity to take Kiliba personalization to the next level through the chatbot. By leveraging the vast amount of data collected by Kiliba, it can provide instant, tailored responses to customer inquiries. This not only enhances the customer experience but also has the potential to increase sales and foster long-term customer loyalty.

Databricks Support for RAG Applications

Databricks provides a comprehensive set of tools and technologies that greatly facilitate the development and deployment of RAG (Retrieval-Augmented Generation) applications. By leveraging Databricks’ powerful data management, machine learning, and serving capabilities, we were able to efficiently build and deploy our chatbot POC during the hackathon.

Existing Databricks technologies that we utilized include:

Unity Catalog: Unity Catalog provides a unified governance layer for all data assets across clouds. It allowed us to easily manage and access the product data needed to personalize the chatbot responses.
Notebooks: Databricks Notebooks offered an interactive, collaborative environment for data exploration, model development, and testing.
Delta Tables: Delta Tables, part of Delta Lake, provided a reliable and scalable storage layer for our data. Key features of Delta Tables include ACID transactions, schema enforcement, and time travel capabilities.

In addition to these existing technologies, Databricks has recently introduced several new features that further support RAG applications:

Vector Search: Vector search is a technique that allows for efficient similarity search over large datasets. In the context of our chatbot, it enables us to quickly find products that are most relevant to a user’s query, based on the similarity between the query and the product vectors stored in our database.
Model Serving: Model serving refers to the process of deploying machine learning models in a production environment, making them available to serve predictions or generate responses in real-time. Databricks’ Model Serving feature simplifies this process by allowing us to deploy our chatbot model as a REST endpoint, which can be easily integrated into online stores.
MLflow for RAG Chain: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, packaging code into reproducible runs, and sharing and deploying models. In our project, MLflow played a crucial role in managing the deployment and serving of our chatbot model. It has been extended lately to handle RAG chains. This allows us to package, deploy, and manage the entire RAG pipeline, from data preprocessing to model serving, as a single entity.
LLM Model Serving: Databricks’ LLM Model Serving offers an unique interface to both opensource and proprietary models to ease development and governance.

The Implementation

The architecture of our chatbot follows a classical Retrieval-Augmented Generation (RAG) approach. Here is a diagram of the architecture:

Here’s a breakdown of the key components of these architecture:

Vector Database Section

This is the data preparation part.

The “Product Database” in our architecture is a centralized repository that stores all the relevant information about the products available on the online stores powered by Kiliba. This data is collected through the Kiliba app, which integrates with the online shops’ backend systems. The database includes product details such as titles, descriptions, images, prices, and categories.

Using OpenAI’s embedding models, we transform the product data into vector embeddings. These embeddings are then stored in a “Vector Database” for efficient similarity search. This vector database is created easily for you by Databricks on top of Databricks Delta tables. A batch job can run periodically to keep the embeddings up to date with any changes in the product catalog.

Web Chatbot Section

This is the web part encompassing the frontend and the backend. Users interact with a “Chatbot widget” embedded on the online store’s website. The chatbot then receives a JSON response with the appropriate answer to display to the user. When a user asks a question, the chatbot sends a JSON payload containing the question to the backend.

Chatbot Endpoint Section:

The user’s question is sent to a “Chatbot endpoint” for processing. This endpoint is “serverless served” by databricks from a model published and put to production through Mlflow. The endpoint retrieves the top 3 most similar products based on the user’s question using the vector embeddings stored in the database. This is passed to an LLM model (we tested Llama and Mistral) to generate appropriate responses based on the user’s question and the retrieved product information.

Conclusion and future work

I’m thrilled to share that we managed to build a working, live, on a real shop, proof of concept for our chatbot in just 3–4 days of effort! The hackathon was an incredible experience, and it showcased the power and ease of use of Databricks’ tools.

You can see on the screenshot below that we were able to recommend meaningful products relative to the query of the customer. Buttons are generated and when clicked will bring you directly to the product page.

One of the limitations we encountered during the hackathon was that MLflow serving didn’t support all the features of the Langchain framework at the time. In particular we were not able to use a memory to give answers taking into account the chat history. By incorporating memory, the chatbot will be able to maintain context across multiple user interactions, allowing for more natural and coherent conversations. This means that users can ask follow-up questions or refer back to previous topics without the need to repeat themselves. This could have been circumvented by passing the chat history back and forth. This is just the beginning, and we’re eager to explore further once the tools evolve and push the boundaries of what’s possible.

Overall, the hackathon was a great success, and it demonstrated the immense potential of AI in enhancing the online shopping experience. We’re excited to continue refining and expanding our chatbot, and we look forward to seeing how it can help the way customers interact with online stores.

Stay tuned for more updates on this exciting project!

If you want to dig further:

Databricks : https://databricks.com
RAG on Databricks : Retrieval Augmented Generation (RAG) on Databricks | Databricks on AWS
MLflow langchain support : mlflow.langchain
OpenAI (for information on embedding models): https://openai.com/blog/introducing-text-and-code-embeddings
Kiliba : https://en.kiliba.com