Quick tip: SingleStoreDB integration with LangChain

Akmal Chaudhri
3 min readJun 27, 2023

A quick example to show the integration of SingleStoreDB with LangChain

Abstract

Recently, SingleStoreDB has been integrated with LangChain. In this short article, we’ll walk through a quick example to demonstrate the integration and how easy it is to use these two technologies together.

Introduction

LangChain is a software development framework designed to simplify the creation of applications using Large Language Models (LLMs). In this short article, we’ll streamline the example described in a previous article developed before the SingleStoreDB LangChain integration was announced, and show how easy it is to use SingleStoreDB and LangChain together.

As described in the previous article, we’ll follow the instructions to create a SingleStoreDB Cloud account, Workspace Group, Workspace, and Notebook.

Fill out the Notebook

First, we’ll install some libraries:

!pip install langchain --quiet
!pip install openai --quiet
!pip install singlestoredb --quiet
!pip install tiktoken --quiet
!pip install unstructured --quiet

Next, we’ll read in a PDF document. This is an article by Neal Leavitt titled “Whatever Happened to Object-Oriented Databases?” OODBs were an emerging technology during the late 1980s and early 1990s. We’ll add leavcom.com to the firewall by selecting the Edit Firewall option in the top right. Once the address has been added to the firewall, we’ll read the PDF file:

from langchain.document_loaders import OnlinePDFLoader

loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")

data = loader.load()

We can use LangChain’s OnlinePDFLoader, which makes reading a PDF file easier.

Next, we’ll get some data on the document:

from langchain.text_splitter import RecursiveCharacterTextSplitter

print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")

The output should be:

You have 1 document(s) in your data
There are 13040 characters in your document

We’ll now split the document into pages containing 2,000 characters each:

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 0)
texts = text_splitter.split_documents(data)

print (f"You have {len(texts)} pages")

Next, we’ll set our OpenAI API Key:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

and use LangChain’s OpenAIEmbeddings:

from langchain.embeddings import OpenAIEmbeddings

embedder = OpenAIEmbeddings()

Now we’ll store the text with the vector embeddings in the database system. This is much simpler using the LangChain integration:

from langchain.vectorstores import SingleStoreDB

os.environ["SINGLESTOREDB_URL"] = "admin:<password>@<host>:3306/pdf_db"

docsearch = SingleStoreDB.from_documents(
texts,
embedder,
table_name = "pdf_docs2",
)

We’ll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

We can now ask a question, as follows:

query_text = "Will object-oriented databases be commercially successful?"

docs = docsearch.similarity_search(query_text)

print(docs[0].page_content)

The integration again shows its power and ease of use.

Finally, we can use a GPT to provide an answer, based on the earlier question:

import openai

prompt = f"The user asked: {query_text}. The most similar text from the document is: {docs[0].page_content}"

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)

print(response['choices'][0]['message']['content'])

Here is some example output:

While object-oriented databases are still in use and have solid niche markets,
they have not gained as much commercial success as relational databases.
Observers previously anticipated that OO databases would surpass relational
databases, especially with the emergence of multimedia data on the internet,
but this prediction did not come to fruition. However, OO databases continue
to be used in specific fields, such as CAD and telecommunications. Experts
have varying opinions on the future of OO databases, with some predicting
further decline and others seeing potential growth.

Summary

Comparing our solution in this article with the previous one, we can see that the LangChain integration provides a simpler solution. For example, we did not need to write any SQL statements. The framework abstracted the database access allowing us to focus on the business problem and providing a compelling, time-saving solution.

--

--

Akmal Chaudhri

I help build global developer communities and raise awareness of technology through presentations and technical writing.