Building a Text Search Application with Elasticsearch and FastAPI

Pritam Sonawane
5 min readJun 26, 2023

--

In this blog post, we will walk through the process of creating a text search application using Elasticsearch and FastAPI. Elasticsearch is a powerful search engine that efficiently indexes and searches text data. FastAPI is a modern web framework for building APIs with Python that integrates well with Elasticsearch.

We will start by ingesting data into Elasticsearch using a data ingestion script. This script will use locally hosted Elasticsearch and SentenceTransformer libraries to connect to Elasticsearch and perform text embedding.

from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer

# Connect to Elasticsearch
es = Elasticsearch("http://localhost:9200")

# Index name
index_name = "test1"

# Example data
data = [
{"id": 1, "text": "The sun slowly set behind the mountains, casting a golden glow across the landscape. The air was crisp and cool, a gentle breeze rustling through the leaves of the trees. Birds chirped in the distance, their melodic songs filling the air. As I walked along the winding path, I couldn't help but marvel at the beauty of nature surrounding me. The scent of wildflowers wafted through the air, intoxicating and refreshing. It was a moment of tranquility, a moment to escape from the chaos of everyday life and immerse myself in the serenity of the natural world."},
{"id": 2, "text": "The bustling city streets were filled with the sound of car horns and chatter. People hurried past, their faces lost in a sea of anonymity. Skyscrapers towered above, their reflective glass windows shimmering in the sunlight. The aroma of street food filled the air, mingling with the scent of exhaust fumes. Neon signs flashed with vibrant colors, advertising the latest products and services. It was a city that never slept, a constant whirlwind of activity and excitement. Amidst the chaos, I navigated through the crowds, searching for moments of connection and inspiration."},
{"id": 3, "text": "The waves crashed against the shore, each one a powerful force of nature. The sand beneath my feet shifted with every step, as if it was alive. Seagulls soared overhead, their calls echoing through the salty air. The ocean stretched out before me, its vastness both awe-inspiring and humbling. I closed my eyes and listened to the symphony of the sea, the rhythm of the waves lulling me into a state of tranquility. It was a place of solace, a place where the worries of the world melted away and all that remained was the beauty of the natural world."},
{"id": 4, "text": "The old bookstore was a treasure trove of knowledge and stories. Rows upon rows of bookshelves lined the walls, each one filled with books of every genre and era. The scent of aged paper and ink filled the air, creating an atmosphere of nostalgia and adventure. As I perused the shelves, my fingers lightly grazing the spines of the books, I felt a sense of wonder and curiosity. Each book held the potential to transport me to another world, to introduce me to new ideas and perspectives. It was a sanctuary for the avid reader, a place where imagination flourished and stories came to life."}
]

# Create Elasticsearch index and mapping
if not es.indices.exists(index=index_name):
es_index = {
"mappings": {
"properties": {
"text": {"type": "text"},
"embedding": {"type": "dense_vector", "dims": 768}
}
}
}
es.indices.create(index=index_name, body=es_index, ignore=[400])

# Upload documents to Elasticsearch with text embeddings
model = SentenceTransformer('quora-distilbert-multilingual')

for doc in data:
# Calculate text embeddings using the SentenceTransformer model
embedding = model.encode(doc["text"], show_progress_bar=False)

# Create document with text and embedding
document = {
"text": doc["text"],
"embedding": embedding.tolist()
}

# Index the document in Elasticsearch
es.index(index=index_name, id=doc["id"], body=document)

We first import the necessary libraries in the data ingestion script, including Elasticsearch and SentenceTransformer. We establish a connection to Elasticsearch using the Elasticsearch URL. We define the index_name variable to hold the name of our Elasticsearch index.

Next, we define the example data as a list of dictionaries, where each dictionary represents a document with an ID and a text. These documents simulate the data we want to search for. You can customize the script based on your specific data source and metadata extraction requirements.

We check if the Elasticsearch index exists and, if not, create it with the appropriate mapping. The mapping defines the field types for our documents, including the text field as text and the embedding field as a dense_vector with a dimension of 768.

We initialize the SentenceTransformer model with the 'quora-distilbert-multilingual' pre-trained model for text embedding. This model can encode text into dense vectors of length 768.

For each document in the example data, we calculate the text embedding using the model.encode() function and store it in the embedding variable. We create a document dictionary with the text and embedding fields. Finally, we index the document in Elasticsearch using the es.index() function.

Now that we have ingested the data into Elasticsearch, let’s move on to creating the search API using FastAPI.

from fastapi import FastAPI
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
# Connect to Elasticsearch
es = Elasticsearch()

app = FastAPI()

@app.get("/search/")
async def search(query: str):
# Perform text embedding using SentenceTransformer
model = SentenceTransformer('quora-distilbert-multilingual')
embedding = model.encode(query, show_progress_bar=False)

# Build the Elasticsearch script query
script_query = {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
"params": {"query_vector": embedding.tolist()}
}
}
}

# Execute the search query
search_results = es.search(index="test1", body={"query": script_query})

# Process and return the search results
results = search_results["hits"]["hits"]
return {"results": results}

To run the FastAPI application, save the code in a file (e.g., main.py) and execute the following command in your terminal:

uvicorn main:app --reload

This will start the FastAPI development server. You can then access the search endpoint at http://localhost:8000/search/ and provide the query parameter to perform the search. The results will be returned as a JSON response.

Make sure to customize the code according to your requirements, such as adding error handling, authentication, and modifying the response structure.

Future Scope — NER Integration for Advanced Search:

To further enhance the search capabilities of our application, we can integrate Named Entity Recognition (NER) into the system. NER is a powerful technique in Natural Language Processing (NLP) that identifies and extracts named entities, such as persons, organizations, locations, and more, from text data.

Here are some future possibilities for NER integration:

  1. Entity-based Search: Utilize the identified named entities to filter search results based on specific entity types. For example, users can search for articles specifically mentioning a particular organization or person.
  2. Entity Suggestions: Leverage the identified entities to provide auto-suggestions or recommendations as users type their queries. This can help users discover related entities and refine their search queries.
  3. Entity-based Analytics: Aggregate search statistics based on named entities to gain insights into popular entities, trends, or correlations. This information can be useful for content analysis, user behaviour analysis, or personalized recommendations.
  4. Entity-based Visualization: Visualize search results using graphs or charts to display relationships between entities, allowing users to explore connections and patterns within the data.

--

--