Demystifying a Typical Search Problem Using Inverted Index

If you just have heard about it and don't have much idea, this is for you

Cinto

Published in

The Startup

6 min readNov 12, 2020

An indexing solution is used to quickly search words/text in documents. An index is largely used in:

→ Search engines

→ Big data low latency analytical solutions like Druid / Pinot

A typical web search problem can be decomposed into 3 major components:

Crawling: This step comprises gathering necessary web content
Indexing: This involves building the index. In this document, we will review the high-level steps involved in building an inverted index
Retrieval: This involves fetching the required information from the documents. This usually demands a sub-second response time

Crawling

This is the first step in any indexing process. This step involves scanning the sources for content. Before building inverted indexes, we must first gather the necessary document collection over which these indexes need to be built. These are a few points that need to be kept in mind:

It should not burden the web servers to impact the actual application
Many crawlers are distributed systems…

Demystifying a Typical Search Problem Using Inverted Index

If you just have heard about it and don't have much idea, this is for you

Crawling

Written by Cinto