Demystifying a Typical Search Problem Using Inverted Index

If you just have heard about it and don't have much idea, this is for you

Cinto
The Startup

--

Image by Michal Jarmoluk from Pixabay

An indexing solution is used to quickly search words/text in documents. An index is largely used in:

→ Search engines

→ Big data low latency analytical solutions like Druid / Pinot

A typical web search problem can be decomposed into 3 major components:

  1. Crawling: This step comprises gathering necessary web content
  2. Indexing: This involves building the index. In this document, we will review the high-level steps involved in building an inverted index
  3. Retrieval: This involves fetching the required information from the documents. This usually demands a sub-second response time

Crawling

This is the first step in any indexing process. This step involves scanning the sources for content. Before building inverted indexes, we must first gather the necessary document collection over which these indexes need to be built. These are a few points that need to be kept in mind:

  1. It should not burden the web servers to impact the actual application
  2. Many crawlers are distributed systems…

--

--

Cinto
The Startup

An engineer, a keen observer, writer about tech, life improvement, motivation, humor, and more. Hit the follow button if you want a weekly dose of awesomeness.